Stardog is a graph database: fast, lightweight, pure Java for mission-critical, enterprise apps. Check out the Quick Start Guide to get Stardog installed and running in five easy steps.
Introduction
Stardog is a fast, lightweight, agile semantic graph database—equally adept in client-server, middleware, and embedded modes. Stardog 2.2.4 (11 December 2014) supports the RDF graph data model; SPARQL 1.1 query language; HTTP and SNARL protocols for remote access and control; OWL 2 and rules for inference and data analytics; and programmatic interaction via several languages and network interfaces.
To learn more about where we’ve been and where we’re headed, consult the release notes and milestones.
1. Downloading Stardog
Download Stardog to get started. The
Stardog
support forum is the place to report bugs, ask questions, etc. You
can also ask questions on Stack Overflow using the tag stardog.
2. Contributing Open Source
There are many open source subsystems of Stardog; feel free to submit pull requests: stardog-docs, stardog.js, stardog-groovy, stardog-spring, stardog.rb, and stardog-clj. Many thanks to everyone who’s contributed so far.[1]
3. Stardog Roadmap
| 3.0 release (February 2015) |
|
| 3.1 release (Q2 2015) |
|
| 4.0 release (Q2 2016) |
|
4. Alternate Versions of this Manual
This is the manual for Stardog 2.2.4 (11 December 2014).[2] Alternate versions of this manual are coming in the 3.0 release.
Version |
EPUB3 |
Kindle |
|
2.2.4 |
✗ |
✗ |
✗ |
Quick Start Guide
Stardog runs on Java 6, Java 7, and should run fine on Java 8. Stardog runs best on, but
does not require a 64-bit JVM that supports sun.misc.Unsafe.
|
Warning
|
Stardog ships with an insecure but usable default setting: the
super user is admin and the admin password is "admin".
|
This is fine until it isn’t, at which point you should read the Security section.
5. Linux and OSX
-
Tell Stardog where its home directory (where databases and other files will be stored) is
$ export STARDOG_HOME=/data/stardogIf you’re using some weird Unix shell that doesn’t create environment variables in this way, adjust accordingly. If
STARDOG_HOMEisn’t defined, Stardog will use the Javauser.dirproperty value.NoteYou should consider the upgrade process when setting STARDOG_HOMEfor production or other serious usage. In particular, you probably don’t want to set the directory where you install Stardog asSTARDOG_HOMEas that makes upgrading less easy. SetSTARDOG_HOMEto some other location. -
Copy the
stardog-license-key.bininto the right place:$ cp stardog-license-key.bin $STARDOG_HOMEOf course
stardog-license-key.binhas to be readable by the Stardog process.Stardog won’t run without a valid
stardog-license-key.bininSTARDOG_HOME. -
Start the Stardog server. By default the server will expose SNARL and HTTP interfaces on port 5820.[3]
$ ./stardog-admin server start -
Create a database with an input file:
$ ./stardog-admin db create -n myDB examples/data/University0_0.owl -
Query the database:
$ ./stardog query myDB "SELECT DISTINCT ?s WHERE { ?s ?p ?o } LIMIT 10"You can use the Web Console to search or query the new database you created by visiting http://localhost:5820/myDB in your browser.
Now, go have a drink: you’ve earned it.
6. Windows
Windows…really? Okay, but don’t blame us if this hurts…The following steps are carried out using the Windows command prompt which you can find under or .
First, tell Stardog where its home directory (where databases and other files will be stored) is:
> set STARDOG_HOME=C:\data\stardog
Second, copy the stardog-license-key.bin into the right place:
> COPY /B stardog-license-key.bin %STARDOG_HOME%
The /B is required to perform a binary copy or the license file may get
corrupted. Of course stardog-license-key.bin has to be readable by the
Stardog process. Finally, Stardog won’t run without a valid stardog-license-key.bin
in STARDOG_HOME.
Third, start the Stardog server. By default the server will expose SNARL and HTTP interfaces on port 5820.[4]
> stardog-admin.bat server start
This will start the server in the current command prompt, you should leave this window open and open a new command prompt window to continue.
Fourth, create a database with an input file:
> stardog-admin.bat db create -n myDB examples/data/University0_0.owl
Fifth, query the database:
> stardog.bat query myDB "SELECT DISTINCT ?s WHERE { ?s ?p ?o } LIMIT 10"
You can use the Web Console to search or query the new database you created by hitting http://localhost:5820/myDB in your browser.
You should drink the whole bottle, brave Windows user!
Using Stardog
Stardog’s primary purpose is to execute queries against RDF data which it has under direct management.
Stardog will not retrieve data from the Web or from any other network via HTTP URLs in order to query that data. If you want to query data using Stardog, you must add that data to a new or existing Stardog database.
Stardog supports SPARQL, the W3C standard for querying RDF graphs.
7. Querying
Stardog supports SPARQL 1.1 [5] and also the OWL 2 Direct Semantics entailment regime.
To execute a SPARQL query against a Stardog database, use the query
subcommand:
$ stardog query myDb "select * where { ?s ?p ?o }"
Detailed information on using the query command in Stardog can be found
on its man page.
7.1. DESCRIBE
SPARQL’s DESCRIBE keyword is deliberately underspecified;
vendors are free to do, for good or bad, whatever they want.
In Stardog a DESCRIBE <theResource> query retrieves the predicates and
objects for all the triples for which <theResource> is the subject.
There are, of course, about seventeen thousand other ways to implement
DESCRIBE; we’ve implemented four or five of them and may expose them to
users in a future release of Stardog based on user feedback and requests.
Now you know and knowing is one-quarter of the fun.
7.2. Query Functions
Stardog supports all of the functions in SPARQL, as well as some others from XPath and SWRL. Any of these functions can be used in queries or rules. Some functions appear in multiple namespaces, but all of the namespaces will work.
See SPARQL Query Functions for the complete list.
8. Updating
There are many ways to update the data in a Stardog database; the most commonly used methods are the CLI and SPARQL Update queries, both of which we discuss below.
8.1. SPARQL Update
SPARQL 1.1 Update can be used to insert RDF into or delete RDF from a Stardog
database using SPARQL query forms INSERT and DELETE, respectively.
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX ns: <http://example.org/ns#>
INSERT DATA
{ GRAPH <http://example/bookStore> { <http://example/book1> ns:price 42 } }
An example of deleting RDF:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
DELETE DATA
{
<http://example/book2> dc:title "David Copperfield" ;
dc:creator "Edmund Wells" .
}
Or they can be combined with WHERE clauses:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
WITH <http://example/addresses>
DELETE { ?person foaf:givenName 'Bill' }
INSERT { ?person foaf:givenName 'William' }
WHERE
{ ?person foaf:givenName 'Bill' }
|
Note
|
Per the SPARQL Update spec, Stardog treats Update queries as implicitly transactional and atomic. Since Stardog does not support nested transactions, it will not (currently) support an Update query in an open transaction.[6] |
8.2. Adding Data with the CLI
As of Stardog 2.2.4, the most efficient way to load data into Stardog is at database creation
time. See the Creating a Database section for bulk
loading data at database creation time. To add data to an existing
Stardog database, use the add command:
$ stardog data add myDatabase 1.rdf 2.rdf 3.rdf
The optional arguments are -f (or --format) to specify the RDF
serialization type of the files to be loaded; if you specify the wrong
type, add will fail. If you don’t specify a type, Stardog will try to
determine the type on its own based on the file extension. For example,
the files that have names ending with '.ttl' will be parsed with Turtle
syntax. If you specify a type, then all the files being loaded must of
that same type.
If you want to add data to a named graph, specify it via the
--graph-uri or -g options.
8.3. Removing Data with the CLI
To remove data from a Stardog database, remove
is used by specifying
-
one Named Graph, or
-
one or more files containing RDF (in some recognized serialization format, i.e., RDF/XML, Turtle, Trig), or
-
one Named Graph and one or more RDF files.
For example,
$ stardog data remove -g http://foo myDatabase
will remove the named graph http://foo and all its triples from
myDatabase.
$ stardog data remove myDatabase 1.rdf
will remove the triples in 1.rdf from (the default graph of)
myDatabase.
$ stardog data remove -g http://foo -f TURTLE myDatabase 2.rdf 3.rdf
will remove the triples in the Turtle files 2.rdf and 3.rdf from the
named graph http://foo of myDatabase.
Strict or loose parsing may be set for the input payload by using
--strict-parsing=TRUE|FALSE.
8.4. How Stardog Handles RDF Parsing
RDF parsing in Stardog is strict: it requires typed RDF literals to
match their explicit datatypes, URIs to be well-formed, etc. In some
cases, strict parsing isn’t ideal—it may be disabled using the
--strict-parsing=FALSE.
However, even with strict parsing disabled, Stardog’s RDF parser may
encounter parse errors from which it cannot recover. And loading data in
lax mode may lead to unexpected SPARQL query results. For example,
malformed literals ("2.5"^^xsd:int) used in filter evaluation may lead
to undesired results.
9. Versioning
Stardog supports graph change management capability that lets users track changes between revisions of a Stardog database, add comments and other metadata to the revisions, extract diffs between those revisions, tag revisions with labels, and query over the revision history of the database using SPARQL.
Versioning support for a database is disabled by default but can be enabled at any time by setting the configuration option versioning.enabled to true. For example, you can create a database with versioning support as follows:
$ stardog-admin db create -o versioning.enabled=true -n myDb
This option can also be set after database creation using the stardog-admin metadata set command.
The following examples give a very brief overview of this capability; see the VCS man pages for all the details.
9.1. Committing Changes
Commit a new version by adding and removing triples specified in files.
Different from the data add/remove commands, commit allows one to add and
remove triples in one commit and to associate a commit message.
|
Note
|
Removals are performed before additions. |
To commit changes:
$ stardog vcs commit --add add_file1.ttl add_file2.ttl --remove remove_file.ttl -m "This is an example commit" myDb
9.2. Viewing Revisions
To see all revisions (commits) in a database:
$ stardog vcs list myDb
$ stardog vcs list --committer userName myDb
The output can be tweaked using --after, --before, and --committer.
9.3. Reverting Revisions
You can revert specific revisions, ranges, etc.
$ stardog vcs revert myDb
$ stardog vcs revert myDb de44369d-cc7b-4244-a3fb-3f6e271420b0
9.4. Viewing Diffs
You can also see the differences between revisions; by default, between the head version and its previous versions or the changes in a specific commit, respectively:
$ stardog vcs diff myDb
$ stardog vcs diff myDb de44369d-cc7b-4244-a3fb-3f6e271420b0
|
Note
|
Diffs are represented as SPARQL Update queries so that they may be used as a kind of graph patch. |
9.5. Using Tags
You can also create, drop, list tags, i.e., named revisions:
$ stardog vcs tag --list myDb
9.6. Querying the Revision History
The revision history of the database is represented as RDF using the W3C PROV vocabulary and can be queried using SPARQL:[7]
$ stardog vcs query myDb 'SELECT...'
10. Exporting
To export data from a Stardog database back to RDF,
export is used by specifying
-
the connection string of the database to export
-
the export format:
N-TRIPLES, RDF/XML, TURTLE, TRIG. The default isN-TRIPLES.TRIGmust be used when exporting the entire database if the database contains triples inside named graphs -
optionally, the URI of the named graph to export if you wish to export a single named graph only
-
a file to export to
For example,
$ stardog data export --format TURTLE myDatabase myDatabase_output.ttl
$ stardog data export --graph-uri http://example.org/context myDatabase myDatabase_output.nt
11. Searching
Stardog includes an RDF-aware semantic search capability: it will index RDF literals and supports information retrieval-style queries over indexed data.
11.1. Indexing Strategy
The indexing strategy creates a "search document" per RDF literal. Each document consists of the following fields: literal ID; literal value; and contexts.
11.2. Search in SPARQL
We use the predicate http://jena.hpl.hp.com/ARQ/property#textMatch to
access the search index in a SPARQL query.
For example,
SELECT DISTINCT ?s ?score
WHERE {
?s ?p ?l.
( ?l ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ( 'mac' 0.5 50 ).
}
This query selects the top 50 literals, and their scores, which match
'mac' and whose scores are above a threshold of 0.5. These literals are
then joined with the generic BGP ?s ?p ?l to get the resources (?s)
that have those literals. Alternatively, you could use
?s rdf:type ex:Book if you only wanted to select the books which
reference the search criteria; you can include as many other BGPs as
you like to enhance your initial search results.
11.3. Searching with the Command Line
First, check out the
search man page:
$ stardog help query search
Okay, now let’s do a search over the O’Reilly book catalog in RDF for everything mentioning "html":
$ stardog query search -q "html" -l 10 catalog
The results?
Index Score Hit
====================
0 6.422 urn:x-domain:oreilly.com:product:9780596527402.IP
1 6.422 urn:x-domain:oreilly.com:product:9780596003166.IP
2 6.422 urn:x-domain:oreilly.com:product:9781565924949.IP
3 6.422 urn:x-domain:oreilly.com:product:9780596002251.IP
4 6.422 urn:x-domain:oreilly.com:product:9780596101978.IP
5 6.422 urn:x-domain:oreilly.com:product:9780596154066.IP
6 6.422 urn:x-domain:oreilly.com:product:9780596157616.IP
7 6.422 urn:x-domain:oreilly.com:product:9780596805876.IP
8 6.422 urn:x-domain:oreilly.com:product:9780596527273.IP
9 6.422 urn:x-domain:oreilly.com:product:9780596002961.IP
11.4. Query Syntax
Stardog search is based on Lucene 4.2.0: we support all of the search modifiers that Lucene supports, with the exception of fields.
-
wildcards:
?and* -
fuzzy:
~and~with similarity weights (e.g.foo~0.8) -
proximities:
"semantic web"~5 -
term boosting
-
booleans:
OR,AND,NOT, +, and `-. -
grouping
For a more detailed discussion, see the Lucene docs.
12. Obfuscating
When sharing sensitive RDF data with others, you might want to (selectively) obfuscate it so that sensitive bits are not present, but non-sensitive bits remain. For example, this feature can be used to submit Stardog bug reports using sensitive data.
Data obfuscation works much the same way as the export command and supports
the same set of arguments:
$ stardog data obfuscate myDatabase obfDatabase.ttl
By default, all URIs, bnodes, and string literals in the database will be
obfuscated using the SHA256 message digest algorithm. Non-string typed literals
(numbers, dates, etc.) are left unchanged as well as URIs from built-in
namespaces (RDF, RDFS, and OWL). It’s possible to customize obfuscation by providing a configuration file.
$ stardog data obfuscate --config obfConfig.ttl myDatabase obfDatabase.ttl
The configuration specifies which URIs and strings will be obfuscated by defining inclusion and exclusion filters. See the example configuration file provided in the distribution for details.
Once the data is obfuscated, queries written against the original data will no longer work. Stardog provides query obfuscation capability, too, so that queries can be executed against the obfuscated data. If a custom configuration file is used to obfuscate the data, then the same configuration should be used for obfuscating the queries as well:
$ stardog query obfuscate --config obfConfig.ttl myDatabase myQuery.sparql > obfQuery.ttl
13. Stardog Web Console
The Stardog Web Console is a responsive web app for the Stardog Server and for
every Stardog database that makes administration and interaction with data quick
and easy; you can access it at http://foo:5820 where foo is the name of the
machine where Stardog is running.
13.1. A Screenshot Tour…
Seriously, this is a lot more fun if you just download the damn thing and hit it with a browser!
13.1.1. Login
To login into the Stardog Web Console, provide your username and password. If you’re an administrative user, you’ll have all the operations available, otherwise the functionality will be limited by your permissions.
13.1.5. Database Status
You can set the database online/offline with the switch included in the top
right of the action bar. Setting the switch to on will set the database
online, switching it off will set the database offline.
|
Note
|
Setting a database offline will result in downtime on all the services provided by the database, e.g. querying, searching, modifying, etc. |
13.1.6. DB Actions
Within the database view, a bar with actions available on the database is included. Depending on the database status, the actions available are:
|
Takes you to the Query Panel of the database, letting you query the DB with SPARQL queries |
|
|
Takes you to the Schema Browser of the database |
|
Will render the database view in edit mode, letting you modify the database settings |
|
|
Migrates the existing content of a legacy database to new format |
|
|
Optimize an existing database |
13.1.7. Drop a database
To drop a database click on Drop, a confirmation will appear to verify the removal.
13.1.8. Creating a new database
To create a new database click New DB in the database listing screen. A wizard will be shown to select and customize the settings of the DB. All values are optional except the database name, and all of them are pre-filled with the default values. You can finish the wizard to create a DB since the first step, just typing the database name and finish.
You can go through the wizard with Next and Back, setting up every section of the database options. Every option contains help as a tooltip that is shown having the cursor over the option label.
Once you’re done setting the database options, at any step of the wizard, just click Finish to create the database. You’ll be redirected to the database view once it has been created in Stardog.
13.1.11. Querying a database
Stardog Web Console includes a SPARQL query editor for executing queries against the database; the editor includes some canned exploration queries, too.
13.1.12. Searching a database
You can search the contents of the database using Stardog’s search capability.
13.1.13. Editing data in a database
You can edit any statements in the database (with the requisite permissions).
13.1.14. Listing in-flight queries
To list the current running queries on the system click Query Management in the top navbar, you’ll be redirected to an accordion style listing of running queries. This listing is refreshed constantly to reflect the running queries in real time, if you have a query that has been running for a while it will be shown here.
13.1.15. View an in-flight query
Clicking on the query entry in the listing will show the in-flight query’s related information, such as the user who posted the query, the database it is running on, the reasoning level used for the query and the related timestamps. The query will be shown at the bottom.
To kill a query in-flight, click on the query element in the listing to expand it and show its related information, a Kill button will be shown, you can click on that button to terminate the query.
13.1.17. View user permissions & roles
The user’s view lets you administer a user’s permissions and the roles it has been assigned to. To add permissions to a new resource for the user, click Add Permission and provide the information for the resource; once it has been added to the list, click on the specific allowed actions. To add permissions to a resource already in the permissions table, just click on the permission actions to add/remove.
To assign the user to a role, just type the role name in the Add role input
and click Add. The role names will be autocompleted to the ones already
existing in the system.
13.1.18. Create a new user
To create a new user click New User and provide the required information on the new User popup modal. You’ll be redirected to the user’s view once it has been created in Stardog.
13.1.20. View role permissions & users assigned to it
The roles’s view lets you administer a role’s permissions and the users it has been assigned to. To add permissions to a new resource for the role, click Add Permission and provide the information for the resource, once it has been added to the list, click on the specific allowed actions. To add permissions to a resource already in the permissions table, just click on the permission actions to add/remove.
To assign a user to the role, just type the username in the Add user input and
click Add. The usernames will be autocompleted to the ones already existing
in the system.
Adminstering Stardog
In this chapter we describe the administration of Stardog Server and Stardog databases, including the various command-line programs, configuration options, etc.
Security is an important part of Stardog administration; it’s discussed separately (Security).
14. Command Line Interface
Stardog’s command-line interface (CLI) comes in two parts:
-
stardog-admin: admininstrative client -
stardog: a user’s client
The admin and user’s tools operate on local or remote databases, using either HTTP or SNARL protocols. Both of these CLI tools are Unix-only, are self-documenting, and the help output of these tools is their canonical documentation.[8]
14.1. Help
To use the Stardog CLI tools, you can start by asking them to display help:
stardog help
Or:
$ stardog-admin help
These work too:
$ stardog
$ stardog-admin
14.2. Security Considerations
We divide administrative functionality into two CLI programs for
reasons of security: stardog-admin will need, in production
environments, to have considerably tighter access restrictions than
stardog.
|
Caution
|
For usability, Stardog provides a default user "admin" and
password "admin" in stardog-admin commands if no user or password
are given. This is obviously insecure; before any serious use of
Stardog is contemplated, read the Security section at least twice,
and then—minimally—change the administrative password to something
we haven’t published on the interwebs!
|
14.3. Command Groups
The CLI tools use "command groups" to make CLI subcommands easier to find. To print help for a particular command group, just ask for help:
$ stardog help [command_group_name]
The command groups and their subcommands:
-
data: add, remove, export;
-
query: search, execute, explain, status;
-
reasoning: explain, consistency;
-
namespace: add, list, remove;
-
server: start, stop;
-
metadata: get, set;
-
user: add, drop, edit, grant, list, permission, revoke, passwd;
-
role: add, drop, grant, list, permission, revoke;
-
db: backup, copy, create, drop, migrate, optimize, list, online, offline, repair, restore, status.
The main help command for either CLI tool will print a listing of the command groups:
usage: stardog []
The most commonly used stardog commands are:
data Commands which can modify or dump the contents of a database
help Display help information
icv Commands for working with Stardog Integrity Constraint support
namespace Commands which work with the namespaces defined for a database
query Commands which query a Stardog database
reasoning Commands which use the reasoning capabilities of a Stardog database
version Prints information about this version of Stardog
See 'stardog help' for more information on a specific command.
To get more information about a particular command, simply issue the help command for it including its command group:
$ stardog help query execute
Finally, everything here about command groups, commands, and online help
works for stardog-admin, too:
$ stardog reasoning consistency -u myUsername -p myPassword -r QL myDB
$ stardog-admin db migrate -u myUsername -p myPassword myDb
14.4. Autocomplete
Stardog also supports CLI autocomplete via bash
autocompletion. To install autocomplete for bash shell, you’ll
first want to make sure bash completion is installed:
14.4.1. Homebrew
To install:
$ brew install bash-completion
To enable, edit `.bash\_profile:
if [ -f `brew --prefix`/etc/bash_completion ]; then
. `brew --prefix`/etc/bash_completion
fi
14.4.2. MacPorts
First, you really should be using Homebrew…ya heard?
If not, then:
$ sudo port install bash-completion
Then, edit .bash\_profile:
if [ -f /opt/local/etc/bash_completion ]; then
. /opt/local/etc/bash_completion
fi
14.4.5. All Platforms
Now put the Stardog autocomplete script—stardog-completion.sh—into your
bash\_completion.d directory, typically one of
/etc/bash_completion.d, /usr/local/etc/bash_completion.d or ~/bash_completion.d.
Alternately you can put it anywhere you want, but tell .bash_profile
about it:
source ~/.stardog-completion.sh
14.5. How to Make a Connection String
You need to know how to make a connection string to talk to a Stardog database. A connection string may consist solely of the database name in cases where
-
Stardog is listening on the standard port(s);
-
SNARL is enabled; and
-
the command is invoked on the same machine where the server is running.
In other cases, a "fully qualified" connection string, as described below, is required.
Further, the connection string is now assumed to be the first argument of any command that requires a connection string. Some CLI subcommands require a Stardog connection string as an argument to identify the server and database upon which operations are to be performed.
Connection strings are URLs and may either be local to the machine where the CLI is run or they may be on some other remote machine.
There are two URL schemes recognized by Stardog:
-
http:// -
snarl://
The former uses Stardog’s (extended) version of SPARQL Protocol; the latter uses Stardog’s native data access protocol, called SNARL.
|
Note
|
stardog-admin and stardog works with HTTP or SNARL Protocol,
interchangeably. SNARL is faster than HTTP in cases where payloads to
and from the server are relatively small; for payloads that are
large, raw transfer time dominates and there isn’t much or any
difference in performance between them.
|
14.6. Example Connection Strings
To make a connection string, you need to know the URL scheme; the machine name and port Stardog Server is running on; any (optional) URL path to the database;[9] and the name of the database:
{scheme}{machineName}:{port}/{databaseName};{connectionOptions}
Here are some example connection strings:
snarl://server/billion-triples-punk
http://localhost:5000/myDatabase
http://169.175.100.5:1111/myOtherDatabase;reasoning=QL
snarl://stardog:8888/the_database
snarl://localhost:1024/db1;reasoning=NONE
Using the default ports for SNARL and HTTP protocols simplifies
connection strings. connectionOptions are a series of ; delimited
key-value pairs which themselves are = delimited. Key names must be
lowercase and their values are case-sensitive. Finally, in the
case where the scheme is SNARL, the machine is "localhost", and the
port is the default SNARL port, a connection string may consist of the
"databaseName" only.
15. Server Admin
Stardog Server is multi-protocol, supporting SNARL and HTTP. The default port for SNARL is 5820; the default port for HTTP is 5822. All administrative functions work over SNARL or HTTP protocols.
15.1. Upgrading Stardog Server
The process of installation is pretty simple; see the Quick Start Guide for details.
But how do we easily upgrade between versions? The key is judicious use of
STARDOG_HOME. Best practice is to keep installation directories for different
versions separate and use a STARDOG_HOME in another location for storing
databases.[10] One you set your STARDOG_HOME environment
variable to point to this directory, you can simply stop the old version and
start the new version without copying or moving any files. You can
also specify the home directory using the --home argument when starting the
server.
15.2. HTTP & SNARL Server Unification
To use any of these commands against a remote server, pass a global
--server argument with an HTTP or SNARL URL.
|
Note
|
If you are running stardog-admin on the same machine where
Stardog Server is running, and you’re using the default protocol
ports, then you can omit the --server argument and simply pass a
database name via -n option. Most of the following commands assume
this case for the sake of exposition.
|
15.3. Server Security
See the Security section for information about Stardog’s security system, secure deployment patterns, and more.
15.4. Configuring Stardog Server
|
Note
|
The properties described in this section control the behavior of the Stardog Server (whether HTTP or SNARL protocols are in use); to set properties or other metadata on individual Stardog databases, see Database Admin. |
Stardog Server’s behavior can be configured via the JVM arg
stardog.home, which sets Stardog Home, overriding the value of
STARDOG_HOME set as an environment variable. Stardog Server’s behavior
can also be configured via a stardog.properties—which is a Java
Properties file—file in STARDOG_HOME. To change the behavior of a
running Stardog Server, it is necessary to restart it.
The following twiddly knobs for Stardog Server are available in
stardog.properties:[11]
-
strict.parsing: Controls whether Stardog parses RDF strictly (true, the default) or loosely (false) -
query.all.graphs: Controls what data Stardog Server evaluates queries against; iftrue, it will query over the default graph and the union of all named graphs; iffalse(the default), it will query only over the default graph. -
query.timeout: Sets the upper bound for query execution time that’s inherited by all databases unless explicitly overriden. See Managing Queries section below for details. -
logging.[access,audit].[enabled,type,file]: Controls whether and how Stardog logs server events; described in detail below. -
logging.slow_query.enabled,logging.slow_query.time,logging.slow_query.type: The three slow query logging options are used in the following way. To enable logging of slow queries, setenabledtotrue. To define what counts as a "slow" query, settimeto a time duration value (positive integer plus "h", "m", "s", or "ms" for hours, minutes, seconds, or milliseconds respectively). To determine the type of logging, settypetotext(the default) orbinary. To state the obvious explicitly, alogging.slow_query.timethat exceeds the value ofquery.timeoutwill result in null logs. -
database.connection.timeout.ms: Controls how long, in milliseconds, connections may idle before being automatically closed by the server. -
bnode.preserve.id: Determines how the Stardog parser handles bnode identifiers that may be present in (some) RDF input. If this property is enabled (i.e.,TRUE), parsing and data loading performance are improved; but the other effect is that if distinct input files use (randomly or intentionally) the same bnode identifier, that bnode will point to one and the same node in the database. If you have input files that use explicit bnode identifiers, and multiple files may use the asame bnode idenitifers, and you don’t want those bnodes to be smushed into a single node in the database, then this configuration option should be disabled (set toFALSE). -
load.parser.count,load.processor.count: Determines the number of parser and processor threads, respectively, to be used during bulk loading of data at database creation time. The default values are3and4respectively, but they may be set higher, to good effect, if you have multi-core CPUs. The former is effective only if multiple input files are being processed; the latter is effective even if a single file is processed as input. The heuristic for these settings is-
the value of
load.parser.count+load.processor.countshould neither exceed20nor be equal to or greater than the number of available cores -
the two values should be roughly equal
-
the values don’t have much effect unless or until you’re loading billions of triples
-
-
password.length.min: Sets the password policy for the minimum length of user passwords, the value can’t be lower thanpassword.length.minor greater thanpassword.length.max. Default:4. -
password.length.max: Sets the password policy for the maximum length of user passwords. Default:1024. -
password.regex: Sets the password policy of accepted chars in user passwords, via a Java regular expression. Default:[\\w@#$%]+
15.5. Starting & Stopping the Server
|
Note
|
Unlike the other stardog-admin subcommands, starting the
server may only be run locally, i.e., on the same machine
the Stardog Server is will run on.
|
The simplest way to start the server—running on the default port,
detaching to run as a daemon, and writing stardog.log to the current
working directory— is
$ stardog-admin server start
To specify parameters:
$ stardog-admin server start --logfile mystardog.log --port=8080
The port can be specified using the property --port.
The HTTP interface can be disabled by using the flag
--no-http and the SNARL interface via --no-snarl.
To shut the server down:
$ stardog-admin server stop
If you started Stardog on a port other than the default, or want to shut
down a remote server, you can simply use the --server option to
specify the location of the server to shutdown.
By default Stardog will bind it’s server to 0.0.0.0. You can specify a different
network interface for Stardog to bind to using the --bind property
of server start.
15.6. Server Monitoring with Watchdog & JMX
Stardog’s JMX implementation is called Watchdog. In addition to providing some basic JVM information, Watchdog also exports information about the Stardog DBMS configuration as well as stats for all of the databases within the system, such as the total number of open connections, size, and average query time.
15.6.1. Accessing Watchdog
To access Watchdog, you can simply use a tool like VisualVM or
JConsole to attach to the process running the JVM, or connect directly
to the JMX server. You can also access information from Watchdog in
the web console for the database, or by performing a GET on
/{db}/watchdog which will return a simple JSON object containing the
information available via JMX.
15.6.2. Configuring Watchdog
By default, Watchdog will bind an RMI server for remote access on port
5833. If you want to change which port Watchdog binds the remote
server to, you can set the property watchdog.port via
stardog.properties. If you wish to disable remote access to JMX,
you can set watchdog.remote.access to false in
stardog.properties. Finally, if you wish to disable Watchdog
completely, set watchdog.enabled to false in stardog.properties.
15.7. Locking Stardog Home
Stardog Server will lock STARDOG_HOME when it starts to prevent
synchronization errors and other nasties if you start more than one
Stardog Server with the same STARDOG_HOME. If you need to run more
than one Stardog Server instance, choose a different STARDOG_HOME or
pass a different value to --home.
15.8. Access & Audit Logging
See the stardog.properties file (in the distribution) for a complete
discussion of how access and audit logging work in Stardog Server.
Basically, audit logging is a superset of the events in access
logging. Access logging covers the most often required logging events;
you should consider enabling audit logging if you really need to log
every server event. Logging generally doesn’t have much impact on
performance; but the safest way to insure that impact is negligible is
to log to a separate disk (or to a centralized logging server, etc.).
The important configuration choices are whether logs should be binary or plain text (both based on ProtocolBuffer message formats); the type of logging (audit or access); the logging location (which may be "off disk" or even "off machine") Logging to a centralized logging facility requires a Java plugin that implements the Stardog Server logging interface; see Java Programming for more information; and the log rotation policy (file size or time).
Slow query logging is also available. See the Managing Queries section below.
16. Database Admin
Stardog is a multi-tenancy system and will happily provide access to multiple, distinct databases.
16.1. Configuring a Database
To administer a Stardog database, some config options must be set at creation time; others may be changed subsequently and some may never be changed. All of the config options have sensible defaults (except, obviously, for the database name), so you don’t have to twiddle any of the knobs till you really need to.
16.2. Configuration Options
The following are the legal configuration options for a Stardog database:
-
database.name: A legal database name. -
database.online: The status of the database: online or offline. It may be set so that the database is created initially in online or offline status; subsequently, it can’t be set directly but only by using the relevant admin commands. -
icv.active.graphs: Specifies which part of the database, in terms of named graphs, is checked with IC validation. Set totag:stardog:api:context:allto validate all the named graphs in the database. -
icv.enabled: Determines whether ICV is active for the database; if true, all database mutations are subject to IC validation (i.e., "guard mode"). -
icv.reasoning-type: Determines what kind of reasoning is used during IC validation. -
index.differential.enable.limit: Sets the minimum size of the Stardog database before differential indexes are used. -
index.differential.merge.limit: Sets the size in number of RDF triples before the differential indexes are merged to the main indexes. -
index.literals.canonical: Enables RDF literal canonicalization. See literal canonicalization for details. -
index.named.graphs: Enables optimized index support for named graphs; speeds SPARQL query evaluation with named graphs at the cost of some overhead for database loading and index maintenance. -
index.persist: Enables persistent indexes. -
index.persist.sync: Enables whether memory indexes are synchronously or asynchronously persisted to disk with respect to a transaction. -
index.statistics.update.automatic: Sets whether statistics are maintained automatically. -
index.type: Sets the index type (memory or disk). -
reasoning.consistency.automatic: Enables automatic consistency checking with respect to a transaction. -
reasoning.punning.enabled: Enables punning. -
reasoning.schema.graphs: Determines which, if any, named graph or graphs contains the "tbox", i.e., the schema part of the data. -
search.enabled: Enables semantic search on the database. -
search.reindex.mode: Sets how search indexes are maintained. -
transactions.durable: Enables durable transactions.
16.2.1. A Note About Database Status
A database must be set to offline status before most configuration
parameters may be changed. Hence, the normal routine is to set the database
offline, change the parameters, and then set the database to online. All
of these operations may be done programmatically from CLI tools, such
that they can be scripted in advance to minimize downtime. In a future
version, we will allow some properties to be set while the database
remains online.
16.2.2. Summary of Configuration Options
The following table summarizes the options:
| Option | Mutable | Default | API |
|---|---|---|---|
|
Yes |
||
|
No |
||
|
Yes |
|
|
|
No |
|
|
|
No |
|
|
|
No |
|
|
|
Yes |
|
|
|
Yes |
|
|
|
Yes |
|
|
|
Yes |
|
|
|
Yes |
|
|
|
No |
|
|
|
No |
|
|
|
Yes |
|
|
|
Yes |
|
|
|
Yes |
|
|
|
No |
|
|
|
Yes |
||
|
Yes |
|
|
|
No |
|
|
|
Yes |
|
|
|
Yes |
|
|
|
Yes |
|
16.2.3. Legal Values of Configuration Options
The following options take a boolean value:
database.online, icv.enabled, index.literals.canonical, index.named.graphs, index.persist, index.sync, index.statistics.update.automatic, reasoning.consistency.automatic, reasoning.punning.enabled, search.enabled, transactions.durable.
The legal value of database.name is given by the regular expression
[A-Za-z]{1}[A-Za-z0-9_-].
The legal value of icv.active.graphs is a comma-separated list of named graph
identifiers. See reasoning.schema.graphs below for syntactic sugar
URIs for default graph and all named graphs.
The legal value of icv.reasoning.type is one of the reasoning levels
(i.e, one of the following strings): NONE, RDFS, QL, RL, EL, DL.
The legal value of index.differential.* is an integer.
The legal value of index.type is the string "disk" or "memory"
(case-insensitive).
The legal value of reasoning.schema.graphs is a comma-separated list of named graph
identifiers, including (optionally) the special names,
tag:stardog:api:context:default and tag:stardog:api:context:all,
which represent the default graph and the union of all named graphs and
the default graph, respectively. In the context of database
configurations only, Stardog will recognize default and * as shorter
forms of those URIs, respectively.
The legal value of search.reindex.mode is one of the strings sync
or async (case insensitive) or a legal
Quartz
cron expression.
16.3. Managing Database Status
Databases are either online or offline; this allows database maintenance to be decoupled from server maintenance.
16.4. Creating a Database
Stardog databases may be created locally or remotely; but, of course,
performance is better if data files don’t have to be transferred over a
network during creation and initial loading. See the section below about
loading compressed data. All data files, indexes, and server metadata
for the new database will be stored in Stardog Home. Stardog won’t
create a database with the same name as an existing database. Stardog
database names must conform to the regular expression,
[A-Za-z]{1}[A-Za-z0-9_-].
|
Note
|
There are four reserved words that may not be used for the
names of Stardog databases: system, admin, watchdog, and docs.
|
Minimally, the only thing you must know to create a Stardog database is a database name; alternately, you may customize some other database parameters and options depending on anticipated workloads, data modeling, and other factors.
See stardog-admin help db create for all the details including
examples.
16.5. Database Archetypes
Stardog database archetypes are a new feature in 2.0. A database archetype is a named, vendor-defined or user-defined bundle of data and functionality to be applied at database-creation time. Archetypes are primarily for supporting various data standards or toolchain configurations in a simple way.
For example, the SKOS standard from W3C defines an OWL vocabulary for building taxonomies, thesauruses, etc. SKOS is made up by a vocabulary, some constraints, some kinds of reasoning, and (typically) some SPARQL queries. If you are developing an app that uses SKOS, without Stardog’s SKOS archetype, you are responsible for assembling all that SKOS stuff yourself. Which is tedious, error-prone, and not very rewarding—even when it’s done right the first time.
Rather than putting that burden on Stardog users, we’ve created database archetypes as a mechanism to collect these "bundles of stuff" which, as a developer, you can then simply attach to a particular database.
The last point to make is that archetypes are composable: you can mix-and-match them at database creation time as needed.
16.5.1. SKOS Archetype
The SKOS archetype is for databases that will contain SKOS data, and includes the SKOS schema, SKOS constraints using Stardog’s Integrity Constraint Validation, and some namespace-prefix bindings.
16.5.2. PROV Archetype
The PROV archetype is for databases that will contain PROV data, and includes the SKOS schema, SKOS constraints using Stardog’s Integrity Constraint Validation, and some namespace-prefix bindings.
Archetypes are composable, so you can use more of them and they are intended to be used alongside your domain data, which may include as many other schemas, ontologies, etc. as are required.
16.6. Database Creation Templates
As a boon to the overworked admin or devops peeps, Stardog Server supports database creation templates: you can pass a Java Properties file with config values set and with the values (typically just the database name) that are unique to a specific database passed in CLI parameters.
16.6.1. Examples
To create a new database with the default options by simply providing a name and a set of initial datasets to load:
$ stardog-admin db create -n myDb input.ttl another_file.rdf moredata.rdf.gz
Datasets can be loaded later as well. To create (in this case, an empty) database from a template file:
$ stardog-admin db create -c database.properties
At a minimum, the configuration file must have a value for
database.name option.
If you only want to change only a few configuration options you can directly provide the values for these options in the CLI args as follows:
$ stardog-admin db create -n db -o icv.enabled=true icv.reasoning.type=QL -- input.ttl
Note that “--” is used in this case when “-o” is the last option to delimit the value for “-o” from the files to be bulk loaded.
Please refer to the CLI help for more details of the db create
command.
16.7. Database Create Options
| Name | Description | Arg values | Default |
|---|---|---|---|
|
If present, sets all mutation operations to database as transactionally durable; durability increases the cost of all mutation operations. |
|
|
|
Specifies the kind of database indexes: memory or disk |
|
disk |
|
Specifies whether the database is searchable |
|
|
|
Specifies that the database’s indexes should be optimized for RDF triples only |
|
16.8. Repairing a Database
If an I/O error or an index exception occurs while querying a DB, the DB might be corrupted and repaired with the repair command. If the errors occur during executing admin commands, then the system DB might have been corrupted. System database corruptions can also cause other problems including authorization errors.
This command needs exclusive access to your Stardog home directory and therefore requires the Stardog Server not to be running. This also means that the command can only be run on the machine where the Stardog home directory is located, and you will not be able to start the Stardog Server while this command is running.
|
Note
|
The repair process can take considerable time for large databases. |
If the built-in Stardog system database is corrupted, then you
can use the database name system as the repair argument.
To repair the database myDB:
$ stardog-admin db repair myDB
To repair the system database:
$ stardog-admin db repair system
16.9. Backing Up and Restoring
Stardog includes both physical and logical backup utilities; logical backups are
accomplished using the export CLI command. Physical backups and restores are
accomplished using stardog-admin db backup and stardog-admin db restore commands,
respectively.
These tools perform physical backups, including database metadata, rather than logical backups via some RDF serialization. They are native Stardog backups and can only be restored with Stardog tools. Backup may be accomplished while a database is online; backup is performed in a read transaction: reads and writes may continue, but writes performed during the backup are not reflected in the backup.
16.9.1. Backup
stardog-admin db backup assumes a default location for its output, namely,
$STARDOG_HOME/.backup; that default may be overriden by passing
a -t or --to argument. Backup sets are stored in the backup directory by
database name and then in data-versioned subdirectories for each backup volume.
Of course you can use a variety of OS-specific options to accomplish remote
backups over some network or data protocol; those options are left as an
exercise for the admin.
To backup a Stardog database called foobar:
$ stardog-admin db backup foobar
To perform a remote backup, for example, pass in a specific directory that may be mounted in the current OS namespace via some network protocol, thus:
$ stardog-admin db backup --to /my/network/share/stardog-backups foobar
|
Note
|
Stardog’s backup/restore approach is optimized for minimizing the amount of time it takes to backup a database; the tradeoff is with restore performance. |
16.9.2. Restore
To restore a Stardog database from a Stardog backup volume, simply pass a fully-qualfied path to the volume in question. The location of the backup should be the full path to the backup, not the location of the backup directory as specified in your Stardog configuration. There is no need to specify the name of the database to restore.
To restore a database from its backup:
$ stardog-admin db restore $STARDOG_HOME/.backups/myDb/2012-06-21
One-time Database Migrations for Backup
The backup system cannot directly backup atbases created in versions before 2.1. These databases must be explicitly migrated in order to use the new backup system; this is a one-time operation per atbase and is accomplished by running
ource,bash]
$ stardog-admin db migrate foobar
in order to migrate a database called foobar. Again, this is a one-time
operation only and all databases created with 2.1 (or later) do not require
it.
16.10. Namespace Prefix Bindings
SPARQL queries can be verbose; but at least the PREFIX declarations in the
prologue of each query are easy to screw up! Stardog allows database
administrators to persist and manage custom namespace prefix bindings:
-
At database creation time, if data is loaded to the database that contains namespace prefixes, then those are persisted for the life of the database. Any subsequent queries to the database may simply omit the
PREFIXdeclarations:$ stardog query myDB "select * {?s rdf:type owl:Class}" -
To add new bindings, use the
namespacesubcommand in the CLI:$ stardog namespace add myDb --prefix ex --uri 'http://example.org/test#' -
To modify an existing binding, delete the existing one and then add a new one:
$ stardog namespace remove myDb --prefix ex -
Finally, to see all of the existing namespace prefix bindings:
$ stardog namespace list myDB
If no files are used during database creation, or if the files do not
define any prefixes (e.g. NTriples), then the "Big Four" default
prefixes are stored: RDF, RDFS, XSD, and OWL.
When executing queries in the CLI, the default table format for SPARQL
SELECT results will use the bindings as qnames. SPARQL CONSTRUCT
query output (including export) will also use the stored prefixes. To reiterate,
namespace prefix bindings are per database, not global.
16.11. Index Strategies
By default Stardog builds extra indexes for named graphs. These
additional indexes are used when SPARQL queries specify datasets using
FROM and FROM NAMED. With these additional indexes, better
statistics about named graphs are also computed.
Stardog may also be configured to create and to use fewer indexes, if the database is only going to be used to store RDF triples—that is to say, if the database will not be used to store named graph information. In this mode, Stardog will maintain fewer indexes, which will result in faster database creation and faster updates without compromising query answering performance. In such databases, quads (that is: triples with named graphs or contexts specified) may still be added to these database at any time, but query performance may degrade in such cases.
To create a database which indexes only RDF triples, set the option
index.named.graphs to false at database creation time. The CLI
provides a shorthand option, -i or --index-triples-only, which is
equivalent.
|
Note
|
This option can only be set at database creation time and cannot be changed later without rebuilding the database; use this option with care. |
16.12. Differential Indexes
While Stardog is generally biased in favor of read performance, write performance is also important in many applications. In order to increase write performance, Stardog may be used, optionally, with a differential index.
Stardog’s differential index is used to persist additions and removals separately from the main indexes, such that updates to the database can be performed faster. Query answering takes into consideration all the data stored in the main indexes and the differential index; hence, query answers are computed as if all the data is stored in the main indexes.
There is a slight overhead for query answering with differential indexes
if the differential index size gets too large. For this reason, the
differential index is merged into the main indexes when its size reaches
DIFF_INDEX_MAX_LIMIT. There is no benefit of differential indexes
if the main index itself is small. For this reason, the differential
index is not used until the main index size reaches
DIFF_INDEX_MAX_LIMIT.
In most cases, the default value of the DIFF_INDEX_MAX_LIMIT parameter will
work fine and doesn’t need to be changed. The corollary is that you shouldn’t
change this value in a production system till you’ve tested the effects of a
change in a non-production system.
16.13. Loading Compressed Data
Stardog supports loading data from compressed files directly: there’s no need to uncompress files before loading. Loading compressed data is the recommended way to load large input files. Stardog supports GZIP and ZIP compressions natively.[12]
16.13.1. GZIP and BZIP2
A file passed to create will be treated as compressed if the file name ends
with .gz or .bz2. The RDF format of the file is determined by the
penultimate extension. For exammple, if a file named test.ttl.gz
is used as input, Stardog will perform GZIP decompression during loading and
parse the file with Turtle parser. All the formats supported by Stardog
(RDF/XML, Turtle, Trig, etc.) can be used with compression.
16.13.2. ZIP
The ZIP support works differently since zipped files can contain
multiple files. When an input file name ends with .zip, Stardog
performs ZIP decompression and tries to load all the files inside the
ZIP file. The RDF format of the files inside the zip is determined
by their file names as usual. If there is an unrecognized file extension
(e.g. '.txt'), then that file will be skipped.
16.14. Dropping a Database
This command removes a database and all associated files and metadata.
This means all files on disk pertaining to the database will be deleted,
so only use drop when you’re certain! Databases must be offline in
order to be dropped.
It takes as its only argument a valid database name. For example,
$ stardog-admin db drop my_db
16.15. Using Integrity Constraint Validation
Stardog supports integrity constraint validation as a data quality mechanism via closed world reasoning. Constraints can be specified in OWL, SWRL, and SPARQL. Please see the Validating Constraints section for more about using ICV in Stardog.
The CLI icv subcommand can be used to add, delete, or drop all
constraints from an existing database. It may also be used to validate
an existing database with constraints that are passed into the icv
subcommand; that is, using different constraints than the ones already
associated with the database.
For details of ICV usage, see stardog help icv and stardog-admin help icv.
For ICV in transacted mutations of Stardog databases, see the database creation
section above.
16.16. Migrating a Database
The migrate subcommand migrates an older Stardog database to the
latest version of Stardog. Its only argument is the name of the database
to migrate. migrate won’t necessarily work between arbitrary Stardog
version, so before upgrading check the release notes for a new version
carefully to see whether migration is required or possible.
$ stardog-admin db migrate myDatabase
will update myDatabase to the latest database format.
16.17. Getting Database Information
You can get some information about a database by running the following command:
$ stardog-admin metadata get my_db_name
This will return all the metadata stored about the database, including the values of configuration options used for this database instance. If you want to get the value for a specific option then you can run the following command:
$ stardog-admin metadata get -o index.named.graphs my_db_name
16.18. Managing Queries
Stardog includes the capability to manage running queries according to configurable policies set at run-time; this capability includes support for listing running queries; deleting running queries; reading the status of a running query; killing running queries that exceed a time threshold automatically; and logging slow queries for analysis.
Stardog is pre-configured with sensible server-wide defaults for query management parameters; these defaults may be overridden or disabled per database.
16.18.1. Configuring Query Management
For many uses cases the default configuration will be sufficient. But
you may need to tweak the timeout parameter to be longer or shorter,
depending on the hardware, data load, queries, throughput, etc. The
default configuration has a server-wide query timeout value of
query.timeout, which is inherited by all the databases in the server.
You can customize the server-wide timeout value and then set
per-database custom values, too. Any database without a custom value
inherits the server-wide value. To disable query timeout, set
query.timeout to 0.
16.18.2. Listing Queries
To see all running queries, use the query list subcommand:
$ stardog-admin query list
The results are formatted tabularly:
+----+----------+-------+--------------+
| ID | Database | User | Elapsed time |
+----+----------+-------+--------------+
| 2 | test | admin | 00:00:20.165 |
| 3 | test | admin | 00:00:16.223 |
| 4 | test | admin | 00:00:08.769 |
+----+----------+-------+--------------+
3 queries running
You can see which user owns the query (superuser’s can see all running queries), as well as the elapsed time and the database against which the query is running. The ID column is the key to deleting queries.
16.18.3. Deleting Queries
To delete a running query, simply pass its ID to the query kill
subcommand:
$ stardog-admin query kill 3
The output confirms the query kill completing successfully:
Query 3 killed successfully
16.18.4. Automatically Killing Queries
For production use, especially when a Stardog database is exposed to arbitrary query input, some of which may not execute in an acceptable time period, the automatic query killing feature is useful. It will protect a Stardog Server from queries that consume too many resources.
Once the execution time of a query exceeds the value of query.timeout,
the query will be killed automatically.[13] The client that submitted the
query will receive an error message. The value of query.timeout may be
overriden by setting a different value (smaller or longer) in database
options. To disable, set to query.timeout to 0.
The value of query.timeout is a positive integer concated with a
letter, interpreted as a time duration: 'h' (for hours),
'm' (for minutes), 's' (for seconds), or 'ms' (for milliseconds). For
example, '1h' for 1 hour, '5m' for 5 minutes, '90s' for 90 seconds, and
'500ms' for 500 milliseconds.
The default value of query.timeout is five minutes.
16.18.5. Query Status
To see more detail about query in-flight, use the query status
subcommand:
$ stardog-admin query status 1
The resulting output includes query metadata, including the query itself:
Username: admin
Database: test
Started : 2013-02-06 09:10:45 AM
Elapsed : 00:01:19.187
Query :
select ?x ?p ?o1 ?y ?o2
where {
?x ?p ?o1.
?y ?p ?o2.
filter (?o1 > ?o2).
}
order by ?o1
limit 5
16.18.6. Slow Query Logging
Stardog does not log slow queries in the default configuration because there isn’t a single value for what counts as a "slow query", which is entirely relative to queries, access patterns, dataset sizes, etc. While slow query logging has very minimal overhead, what counts as a slow query in some context may be quite acceptable in another. See Configuring Stardog Server above for the details.
16.18.7. Protocols and Java API
For HTTP protocol support, see Stardog’s Apiary docs.
For Java, see the Javadocs.
16.18.8. Security and Query Management
The security model for query management is very simple: any user can kill any running query submitted by that user, and a superuser can kill any running query. The same general restriction is applied to query status; you cannot see status for a query that you do not own, and a superuser can see the status of every query.
16.19. Managing Search Indexes
Stardog’s search service is described in Using Stardog section. However, managing search indexes is an administrative task and, thus, is described here.
There are three modes for rebuilding indexes:
-
sync: Recompute the search index synchronously with a transacted write. -
async: Recompute the search index asynchronously as soon as possible with respect to a transacted write. -
Scheduled: Use a cron expression to specify when the search index should be updated.
This is specified when creating a database by setting the property
search.reindex.mode to sync, async, or to a valid cron expression.
The default is sync.
16.20. ACID Transactions
What follows is specific guidance with respect to Stardog’s transactional semantics and guarantees.[14]
16.20.1. Atomicity
Databases may provide a guarantee of atomicity—groups of database actions (i.e., mutations) are irreducible and indivisible: either all of the changes happen or none of them happens. Stardog’s transacted writes are atomic. Stardog does not support nested transactions.[15]
16.20.2. Consistency
Data stored should be valid with respect to the data model (in this case, RDF) and to the guarantees offered by the database, as well as to any application-specific integrity constraints that may exist. Stardog’s transactions are guaranteed not to violate integrity constraints during execution. A transaction that would leave a database in an inconsistent or invalid state is aborted.
See the Validating Constraints section for a more detailed consideration of Stardog’s integrity constraint mechanism.
16.20.3. Isolation
A Stardog connection will run in
READ
COMMITTED isolation level if it has not started an explicit
transaction and will run in READ COMMITTED SNAPSHOT isolation level
if it has started a transaction. In either mode, uncommitted changes
will only be visible to the connection that made the changes: no other
connection can see those values before they are committed. Thus,
"dirty reads" can never occur. Neither mode locks the database; if
there are conflicting changes, the latest commit
wins.[16]
The difference between READ COMMITTED and READ COMMITTED SNAPSHOT isolation
levels is that in the former case a connection will see updates committed by
another connection immediately, whereas in the latter case a connection will see
a transactionally consistent snapshot of the data as it existed at the start of
the transaction and will not see any updates.
We illustrate the difference between these two levels with the following example
where initially the database contains a single triple :x :value 1.
Time |
Connection 1 |
Connection 2 |
Connection 3 |
|
|
|
|
|
|
||
|
|
||
|
|
|
|
|
|
||
|
|
||
|
|
|
|
|
|
||
|
|
||
|
|
|
|
16.20.4. Durability
By default Stardog’s transacted writes are not durable; in some applications transactional durability is required and, thus, should be enabled.
16.20.5. Commit Failure Autorecovery
Stardog’s transaction framework is largely maintenance free; but there are some rare conditions in which manual intervention may be needed.
Stardog’s strategy for recovering automatically from (the very unlikely event of) commit failure is as follows:
-
Stardog will roll back the transaction upon a commit failure;
-
Stardog takes the affected database offline for maintenance;[17] then
-
Stardog will begin recovery, bringing the recovered database back online once that task is successful so that operations may resume.
With an appropriate logging configuration for production usage (at least
error-level logging), log messages for the preceding recovery operations
will occur. If for whatever reason the database fails to be returned
automatically to online status, an administrator may use the CLI tools
(i.e., stardog-admin db online) to attempt to online the database.
16.21. Optimizing Bulk Data Loading
Stardog tries hard to do bulk loading at database creation time in the most efficient and scalable way possible. But data loading time can vary widely, depending on factors in the data to be loaded, including the number of unique resources, etc. Here are some tuning tips that may work for you:
-
Load compressed data since compression minimizes disk access
-
Use a multicore machine since bulk loading is highly parallelized and indexes are built concurrently
-
Load multiple files together at creation time since different files will be parsed and processed concurrently improving the load speed
-
Turn off strict parsing (see Configuring a Database for the details).
-
If you are not using named graphs, use triples only indexing.
17. Capacity Planning
The primary system resources used by Stardog are CPU, memory, and disk.[18] Stardog will take advantage of multiple CPUs, cores, and core-based threads in data loading and in throughput-heavy or multi-user loads. And obviously Stardog performance is influenced by the speed of CPUs and cores. But some workloads are bound by main memory or by disk I/O (or both) more than by CPU. In general, use the fastest CPUs you can afford with the largest secondary caches and the most number of cores and core-based threads of execution, especially in multi-user workloads.
The following subsections provides more detailed guidance for the memory and disk resource requirements of Stardog.
17.1. Memory usage
Stardog uses system memory aggressively and the total system memory available to Stardog is often the most important factor in performance. Stardog uses both JVM memory (heap memory) and also the operating system memory outside the JVM (off heap memory). Having more system memory available is always good; however, increasing JVM memory too close to total system memory is not usually prudent as it may tend to increase Garbage Collection (GC) time in the JVM.
The following table shows recommended JVM memory and system memory requirements for Stardog.[19]
| # of Triples | JVM Memory | Off-heap memory |
|---|---|---|
100 million |
3GB |
3GB |
1 billion |
4GB |
8GB |
10 billion |
8GB |
64GB |
20 billion |
16GB |
128GB |
50 billion |
16GB |
256GB |
Out of the box, Stardog CLI sets the maximum JVM memory to 2GB. This setting
works fine for most small databases (up to, say, 100 million
triples). As the database size increases, we recommend increasing JVM
memory. You can increase the JVM memory for Stardog by setting the
system property STARDOG_JAVA_ARGS using the standard JVM options. For
example, you can set this property to "-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"
to increase the JVM memory to 4GB and off-heap to 8GB. We recommend
setting the minimum heap size (-Xms option) as close to
the max heap size (-Xmx option) as possible.
17.1.1. System Memory and JVM Memory
Stardog uses an off-heap, custom memory allocation scheme. Please note
that the memory provisioning recommendations above are for two kinds of memory
allocations for the JVM in which Stardog will run. The first is for memory that
the JVM will manage explicitly (i.e., "JVM memory"); and the second, i.e.,
"Off-heap memory" is for memory that Stardog will manage explicitly, i.e., off
the JVM heap, but for which the JVM should be notified via the
MaxDirectMemorySize property. In most cases, this should be somewhat less than
the total memory available to the underlying operating system as requirements
dictate.
17.2. Disk usage
Stardog stores data on disk in a compressed format. The disk space needed for a database depends on many factors besides the number of triples, including the number of unique resources and literals in the data, average length of resource identifiers and literals, and how much the data is compressed. The following table shows typical disk space used by a Stardog database.
| # of triples | Disk space |
|---|---|
1 billion |
70GB to 100GB |
10 billion |
700GB to 1TB |
These numbers are given for information purposes only; the actual disk usage for a database may be significantly different in practice. Also it is important to note that the amount of disk space needed at creation time for bulk loading data is higher as temporary files will be created. The additional disk space needed at bulk loading time can be 40% to 70% of the final database size.
Disk space used by a database is non-trivially smaller if triples-only indexing is used. Triples-only indexing reduces overall disk space used by 25% on average; however, note the tradeoff: SPARQL queries involving named graphs perform significantly better with quads indexing.
The disk space used by Stardog is additive for multiple databases and there is very little disk space used other than what is required for the databases. To calculate the total disk space needed for multiple databases, one may sum the disk space needed by each database.
18. Using Stardog on Windows
Stardog provides batch (.bat) files for use on Windows; they
provide roughly the same set of functionality provided by the Bash
scripts which are used on Unix-like systems. There are, however, a few small
differences between the two. When you start a server with
server start on Windows, this does not detach to the background,
it will run in the current console.
To shut down the server correctly, you should either issue a
server stop command from another window or press Ctrl+C (and then
Y when asked to terminate the batch job). Do not under any
circumstance close the window without shutting down the server. This
will simply kill the process without shutting down Stardog, which could
cause your database to be corrupted.
The .bat scripts for Windows support our standard STARDOG_HOME and
STARDOG_JAVA_ARGS environment variables which can be used to control
where Stardog’s database is stored and, usually, how much memory is given
to Stardog’s JVM when it starts. By default, the script will use the JVM
that is available in the directory from which Stardog is run via the
JAVA_HOME environment variable. If this is not set, it will simply
execute java from within that directory.
18.1. Running Stardog as a Windows Service
You can run Stardog as a Windows Service using the following configuration. Please, note, that the following assumes commands are executed from a Command Prompt with administrative privileges.
18.1.1. Installing the Service
Change the directory to the Stardog installation directory:
cd c:\stardog-$VERSION
18.1.2. Configuring the Service
The default settings with which the service will be installed are
-
2048 MB of RAM
-
STARDOG_HOMEis the same as the installation directory -
the name of the installed service will be "Stardog Service"
-
Stardog service will write logs to the "logs" directory within the installation directory
To change these settings, set appropriate environment variables:
-
STARDOG_MEMORY: the amount of memory in MB (e.g., setSTARDOG_MEMORY=4096) -
STARDOG_HOME: the path toSTARDOG_HOME(e.g., setSTARDOG_HOME=c:\\stardog-home) -
STARDOG_SERVICE_DISPLAY_NAME: a different name to be displayed in the list of services (e.g., setSTARDOG_SERVICE_DISPLAY_NAME=Stardog Service) -
STARDOG_LOG_PATH: a path to a directory where the log files should be written (e.g., setSTARDOG_LOG_PATH=c:\\stardog-logs)
If you have changed the default administrator password, you also
need to modify stop-service.bat and specify the new username and
password there (by passing -u and -p parameters in the line that
invokes stardog-admin server stop).
18.1.3. Installing Stardog as a Service
Run the install-service.bat script.
At this point the service has been installed, but it is not running. To
run it, see the next section or use any Windows mechanism for
controlling the services (e.g., type services.msc on the command
line).
18.1.4. Starting, Stopping, & Changing Service Configuration
Once the service has been installed, execute stardog-serverw.exe,
which will allow you to configure the service (e.g., set whether the
service is started automatically or manually), manually start and stop
the service, as well as to configure most of the service parameters.
High Availability Cluster
In this section we explain how to configure, use, and administer Stardog Cluster for uninterrupted operations.
Stardog Cluster is a collection of Stardog Server instances running on one or
more virtual or physical machines that, from the client’s perspective, behave
like a single Stardog Server instance. To fully achieve this effect
requires DNS (i.e., with SRV records) and proxy configuration that’s left as
an exercise for the user. Of course Stardog Cluster should have some different
operational properties, the main one of which is high availability. But from the
client’s perspective Stardog Cluster should be indistinguishable from
non-clustered Stardog.[20]
|
Note
|
High Availability requires at least three nodes in the Cluster. Stardog Cluster works best, with respect to fault resiliency, with a cluster size that is an odd-number greater than or equal to three : 3, 5, 7, etc.[21] With respect to performance, larger cluster sizes perform better than smaller ones. |
19. Configuration
To deploy Stardog Cluster you use stardog-admin commands and some additional
configuration. Stardog Cluster depends on Apache ZooKeeper. In the following
installation notes, we install Stardog Cluster in a 3-node configuration on a
LAN. If you need larger cluster, adjust accordingly.[22]
-
Install Stardog 2.2.4 on each machine in the cluster.
NoteThe smart thing to do here, of course, is to use whatever infrastructure you have in place to automate software installation. Adapting Stardog installation to Chef, Puppet, cfengine, etc. is left as an exercise for the reader. -
Make sure a valid Stardog license key (whether Developer, Enterprise, or a 30-day eval key) for the size of cluster you’re creating exists and resides in
STARDOG_HOMEon each node. You must also have astardog.propertiesfile with the following information for each node in the cluster:# Flag to enable the cluster, without this flag set, the rest of the properties have no effect pack.enabled=true # this node's IP address (or hostname), and port where other Stardog nodes are going to connect pack.node.address=196.69.68.1:5821 # the connection string for ZooKeeper where cluster state is stored pack.cluster.address=196.69.68.1:2180,196.69.68.2:2180,196.69.68.3:2180 # credentials used for securing ZooKeeper state pack.cluster.username=pack pack.cluster.password=adminpack.cluster.addressis a ZooKeeper connection string where cluster stores its state.pack.cluster.usernameandpack.cluster.passwordare user and password tokens for cluster communications and may be different from actual Stardog users and passwords; however, all nodes must use the same user and password combination. -
Create the ZooKeeper configuration for each node. This config file is just a standard ZooKeeper configuration file. The following config file should be sufficient for most cases.
On node 1:
tickTime=2000 # Make sure this directory exists and # ZK can write and read to and from it. dataDir=/tmp/zookeeperdata/ clientPort=2180 initLimit=5 syncLimit=2 # This is an enumeration of all nodes in # the cluster and must be identical in # each node's config. server.1=196.69.68.1:2888:3888 server.2=196.69.68.2:2888:3888 server.3=196.69.68.3:2888:3888On node 2:
tickTime=2000 dataDir=/tmp/zookeeperdata/ clientPort=2180 initLimit=5 syncLimit=2 server.1=196.69.68.1:2888:3888 server.2=196.69.68.2:2888:3888 server.3=196.69.68.3:2888:3888Finally, on node 3:
tickTime=2000 dataDir=/tmp/zookeeperdata/ clientPort=2180 initLimit=5 syncLimit=2 server.1=196.69.68.1:2888:3888 server.2=196.69.68.2:2888:3888 server.3=196.69.68.3:2888:3888NoteThe clientPortspecified inzookeeper.propertiesand the ports used inpack.cluster.addressinstardog.propertiesmust be the same. -
dataDiris where ZooKeeper persists cluster state and where it writes log information about the cluster.$ mkdir /tmp/zookeeperdata # on node 1 $ mkdir /tmp/zookeeperdata # on node 2 $ mkdir /tmp/zookeeperdata # on node 3 -
ZooKeeper requires a
myidfile in thedataDirfolder to identify itself, you will create that file as follows fornode1andnode2, respectively:$ echo 1 > /tmp/zookeeperdata/myid # on node 1 $ echo 2 > /tmp/zookeeperdata/myid # on node 2 $ echo 3 > /tmp/zookeeperdata/myid # on node 3
20. Installation
In the next few steps you will use the Stardog Admin CLI commands to deploy Stardog Cluster: that is, ZooKeeper, the Proxy, and Stardog itself.
-
To start ZooKeeper’s part of Cluster, use the
stadog-admin clustersubcommand:$ ./stardog-admin cluster zkstart --home ~/stardog # on node 1 $ ./stardog-admin cluster zkstart --home ~/stardog # on node 2 $ ./stardog-admin cluster zkstart --home ~/stardog # on node 3Which uses the
zookeeper.propertiesconfig file in~/stardogand log its output to~/stardog/zookeeper.log. If your$STARDOG_HOMEis set to~/stardog, then you don’t need to specify the--homeoption. For more info about the command:$ ./stardog-admin help cluster zkstartOnce ZooKeeper is started, you can start Stardog Cluster:
$ ./stardog-admin server start --home ~/stardog --port 5821 # on node 1 $ ./stardog-admin server start --home ~/stardog --port 5821 # on node 2 $ ./stardog-admin server start --home ~/stardog --port 5821 # on node 3Again, if your
$STARDOG_HOMEis set to~/stardog, you don’t need to specify the--homeoption.NoteWe start Stardog here on the non-default port ( 5821) so that the Proxy can run on the default port (5820), which means that Stardog clients can act normally (i.e., use the default port,5820) since they need to interact with the Proxy. -
Start the Stardog Cluster Proxy:
$ ./stardog-admin cluster proxystart --zkconnstr 196.69.68.1:2180,196.69.68.2:2180,196.69.68.3:2180 \ --user pack --password admin --port 5820 # on node 1 $ ./stardog-admin cluster proxystart --zkconnstr 196.69.68.1:2180,196.69.68.2:2180,196.69.68.3:2180 \ --user pack --password admin --port 5820 # on node 2 $ ./stardog-admin cluster proxystart --zkconnstr 196.69.68.1:2180,196.69.68.2:2180,196.69.68.3:2180 \ --user pack --password admin --port 5820 # on node 3Note that the
zkconnstroption is the same connection string aspack.cluster.addressinstardog.properties, anduserandpasswordare the same aspack.cluster.usernameandpack.cluster.password, respectively. For more information on the proxy configuration execute:$ ./stardog-admin help cluster proxystart
Now Stardog Cluster is running on 3 nodes, one each on 3 machines. Since
the proxy was conveniently configured to use port 5820 you can execute
standard Stardog CLI commands to the Cluster:
$ ./stardog-admin db create -n myDb
$ ./stardog data add myDb /path/to/my/data
$ ./stardog query myDb "select * { ?s ?p ?o } limit 5"
21. Cluster Topologies & Cluster Size
In the configuration instructions above, we assume a particular Cluster
typology, which is to say, for each node n of a cluster, we run Stardog,
ZooKeeper, and a Proxy. But this is not the only typology supported by Stardog
Cluster.
ZooKeeper nodes run independently, so other typologies—three ZooKeeper servers and five Stardog servers are possible—you just have to point Stardog to the corresponding ZooKeeper cluster.
To add more Stardog Cluster nodes, simply repeat the steps for Stardog on additional machines. Generally, as mentioned above, Stardog Cluster size should be an odd number greater or equal to 3.
22. Stardog Cluster Client
To use Stardog Cluster with standard Stardog clients and CLI tools in the
ordinary way--stardog-admin and stardog--you must have Stardog installed
locally. With the provided Stardog binaries in the Stardog Cluster distribution
you can query the state of Cluster:[23]
$ ./stardog-admin --server snarl://<ipaddress>:5820/ cluster info
where ipaddress is the IP address of any of the nodes in the cluster. This
will print the available nodes in the cluster, as well as the roles (participant
or coordinator). You can also input the proxy IP address and port to get the
same information.
To add or remove data, issue stardog data add or remove commands to
any node in the cluster. Queries can be issued to any node in the cluster using
the stardog query command. All the stardog-admin features are also available
in Cluster, which means you can use any of the commands to create
databases, adminster users, and the rest of the functionality.
23. Stardog Cluster Guarantees
Stardog Cluster implements an atomic commitment protocol based on two-phase commit (2PC) over a shared replicated memory that’s provided by Apache ZooKeeper. A cluster is composed of a set of Stardog servers running together. One of the servers is known as the Coordinator and the rest as Participants.
In case the Coordinator fails at any point, a new Coordinator will be elected
out of the remaining available Participants. Stardog Cluster supports both
read (e.g., querying) and write (e.g., adding data) requests. Read requests
are load-balanced over the available Participants, whereas write requests are
transparently forwarded to and handled by the Coordinator. In some future
release we may change the protocol implemented by the Cluster and thus change
some of the allowable topologies, including multiple-writers and
multiple-readers.
When a client commits a transaction (containing a list of write requests), it
will be acknowledged by the Coordinator only after every non-failing Participant
has committed the transaction. If a Participant fails during the process of
committing a transaction, it will be expelled from the cluster by the
Coordinator and put in a temporary failed state.
If the Coordinator fails during the process, the transaction will be aborted,
and a new Coordinator will be elected automatically. Since failed nodes are
not used for any subsequent read or write requests, if a commit is
acknowledged by the Coordinator, then Stardog Cluster guarantees that the data
has been accordingly modified at every available node in the cluster.
While this approach is less performant with respect to write operations than eventual consistency used by other distributed databases, typically those databases offer a much less expressive data model than Stardog, which makes an eventually consistency model more appropriate for those systems. But since Stardog’s data model is not only richly expressive but rests in part on provably correct semantics, we think that a strong consistency model is worth the cost.[24]
Security
Stardog’s security model is based on standard role-based access control: users have permissions over resources during sessions; permissions can be grouped into roles; and roles can be assigned to users.
Stardog uses Apache Shiro for authentication, authorization, and session management and jBCrypt for password hashing.
24. Resources
A resource is some Stardog entity or service to which access is
controlled. Resources are identified by their type and their name. A
particular resource is denoted as type_prefix:name. The valid resource
types with their prefixes are shown below.
| Resource | Prefix | Description |
|---|---|---|
User |
|
A user (e.g., |
Role |
|
A role assigned to a user ( |
Database |
|
A database ( |
Database Metadata |
|
Metadata of a database ( |
Database Admin |
|
Database admin tasks (e.g., |
Integrity Constraints |
|
Integrity constraints associated with a database (e.g., |
25. Permissions
Permissions are composed of a permission subject, an action, and a permission object, which is interpreted as…"the subject resource can perform the specified action over the object resource".
Permission subjects can be of type user or role only. Permission
objects can be of any valid type. Valid actions include the following:
read-
Permits reading the resource properties
write-
Permits changing the resource properties
create-
Permits creating new resources
delete-
Permits deleting a resource
grant-
Permits granting permissions over a resource
revoke-
Permits revoking permissions over a resource
execute-
Permits executing administration actions over a database
all-
Special action type that permits all previous actions over a resource
25.1. Wildcards
Stardog understands the use of wildcards to represent sets of resources. A
wildcard is denoted with the character *. Wildcards can be used to
create complex permissions; for instance, we can give a user the ability
to create any database by granting it a create permission over db:*.
Similarly, wildcards can be used in order to revoke multiple permissions
simultaneously.
25.2. Superusers
It is possible at user-creation time to specify that a given user is a
superuser. Being a superuser is equivalent to having been granted an
all permission over every resource, i.e., *:*. Therefore, as
expected, superusers are allowed to perform any valid action over any
existing (or future) resource.
25.3. Database Owner Default Permissions
When a user creates a resource, it is automatically granted delete,
write, read, grant, and revoke permissions over the new
resource. If the new resource is a database, then the user is
additionally granted write, read, grant, and revoke permissions
over icv-constraints:theDatabase and execute permission over
admin:theDatabase. These latter two permissions give the owner of the
database the ability to administer the ICV constraints for the database
and to administer the database itself.
25.4. Default Security Configuration
|
Warning
|
Out of the box, the Stardog security setup is minimal and
insecure: user:admin with password set to "admin" is a
superuser; user:anonymous with password "anonymous" has the "reader"
role; role:reader allows read of any resource.
|
Do not deploy Stardog in production or in hostile environments with the default security settings.
25.5. Setting Password Constraints
To setup the constraints used to validate passwords when adding new users,
configure the following settings in the stardog.properties configuration file.
-
password.length.min: Sets the password policy for the minimum length of user passwords, the value can’t be less than 1 or greater thanpassword.length.max. Default:4. -
password.length.max: Sets the password policy for the maximum length of user passwords, the value can’t be greater than 1024 or less than 1. Default:20. -
password.regex: Sets the password policy of accepted chars in user passwords, via a Java regular expression. Default:[\\w@#$%]+
25.6. Using a Password File
To avoid putting passwords into scripts or environment variables, you can put them into a suitably secured password file. If no credentials are passed explicitly in CLI invocations, or you do not ask Stardog to prompt you for credentials interactively, then it will look for credentials in a password file.
On a Unix system, Stardog will look for a file called .sdpass in the
home directory of the user Stardog is running as; on a Windows system,
it will look for sdpass.conf in Application Data\stardog in the
home directory of the user Stardog is running as. If the file is not
found in these locations, Stardog will look in the location provided
by the stardog.passwd.file system property.
25.6.1. Password File Format
The format of the password file is as follows:
-
any line that starts with a
#is ignored -
each line contains a single password in the format:
hostname:port:database:username:password. -
wildcards,
*, are permitted for any field but the password field; colons and backslashes in fields are escaped with\.
For example,
#this is my password file; there are no others like it and this one is mine anyway...
*:*:*:flannery:aNahthu8
*:*:summercamp:jemima:foh9Moaz
Of course you should secure this file carefully, making sure that only the user that Stardog runs as can read it.
26. Managing Stardog Securely
Stardog resources can be managed securely by using the tools included in the admin CLI or by programming against Stardog APIs. In this section we describe the permissions required to manage various Stardog resources either by CLI or API.
26.1. Users
- Create a user
-
createpermission overuser:*. Only superusers can create other superusers. - Delete a user
-
deletepermission over the user. - Enable/Disable a user
-
User must be a superuser.
- Change password of a user
-
User must be a superuser or user must be trying to change its own password.
- Check if a user is a superuser
-
readpermission over the user or user must be trying to get its own info. - Check if a user is enabled
-
readpermission over the user or user must be trying to get its own info. - List users
-
Superusers can see all users. Other users can see only users over which they have a permission.
26.2. Roles
- Create a role
-
createpermission overrole:*. - Delete a role
-
deletepermission over the role. - Assign a role to a user
-
grantpermission over the role and user must have all the permissions associated to the role. - Unassign a role from a user
-
revokepermission over the role and user must have all the permissions associated to the role. - List roles
-
Superusers can see all roles. Other users can see only roles they have been assigned or over which they have a permission.
26.3. Databases
- Create a database
-
createpermission overdb:*. - Delete a database
-
deletepermission overdb:theDatabase. - Add/Remove integrity constraints to a database
-
writepermission overicv-constraints:theDatabase. - Verify a database is valid
-
readpermission overicv-constraints:theDatabase. - Online/Offline a database
-
executepermission overadmin:theDatabase. - Migrate a database
-
executepermission overadmin:theDatabase. - Optimize a database
-
executepermission overadmin:theDatabase. - List databases
-
Superusers can see all databases. Regular users can see only databases over which they have a permission.
26.4. Permissions
- Grant a permission
-
grantpermission over the permission object and user must have the permission that it is trying to grant. - Revoke a permission from a user or role over an object resource
-
revokepermission over the permission object and user must have the permission that it is trying to revoke. - List user permissions
-
User must be a superuser or user must be trying to get its own info.
- List role permissions
-
User must be a superuser or user must have been assigned the role.
27. Deploying Stardog Securely
To ensure that Stardog’s RBAC access control implementation will be effective, all non-administrator access to Stardog databases should occur over network (i.e., non-native) database connections.[25]
To ensure the confidentiality of user authentication credentials when using remote connections, the Stardog server should only accept connections that are encrypted with SSL.
27.1. Configuring Stardog to use SSL
Stardog HTTP server includes native support for SSL. The SNARL server (via
snarls://) also supports SSL. To enable Stardog to optionally support SSL
connections, just pass --enable-ssl to the server start command. If you want
to require the server to use SSL only, that is, to reject any non-SSL
connections, then use --require-ssl.
When starting from the command line, Stardog will use the standard Java properties for specifying keystore information:
-
javax.net.ssl.keyStorePassword(the password) -
javax.net.ssl.keyStore(location of the keystore) -
javax.net.ssl.keyStoreType(type of keystore, defaults to JKS)
These properties are checked first in stardog.properties; then in JVM args
passed in from the command line, e.g. -Djavax.net.ssl.keyStorePassword=mypwd. If
you’re creating a Server progammatically via ServerBuilder, you can specify
values for these properties using the appropriate ServerOptions when creating
the server. These values will override anything specified in
stardog.properties or via normal JVM args.
27.2. Configuring Stardog Client to use SSL
Stardog HTTP client supports SSL when the https: scheme is used in the
database connection string; likewise, it uses SSL for SNARL when the connection
string uses the snarls: scheme. For example, the following invocation of the
Stardog command line utility will initiate an SSL connection to a remote
database:
$ stardog status -c https://stardog.example.org/sp2b_10k
If the client is unable to authenticate to the server, then the connection will fail and an error message like the following will be generated.
Error during connect. Cause was SSLPeerUnverifiedException: peer not authenticated
The most common cause of this error is that the server presented a certificate that was not issued by an authority that the client trusts. The Stardog HTTP client driver uses standard Java security components to access a store of trusted certificates. By default, it trusts a list of certificates installed with the Java runtime environment, but it can be configured to use a custom trust store.[26]
The client driver can be directed to use a specific Java KeyStore file as a
trust store by setting the javax.net.ssl.trustStore system property. To
address the authentication error above, that trust store should contain the
issuer of the server’s certificate. Standard Java tools can create such a file.
The following invocation of the keytool utility creates a new trust store
named my-truststore.jks and initializes it with the certificate in
my-trusted-server.crt. The tool will prompt for a passphrase to associate with
the trust store. This is not used to encrypt its contents, but can be used to
ensure its integrity.[27]
$ keytool -importcert -keystore my-truststore.jks -alias stardog-server -file my-trusted-server.crt
The following Stardog command line invocation uses the newly created truststore.
$ STARDOG_JAVA_ARGS=”-Djavax.net.ssl.trustStore=my-truststore.jks” stardog \
status -c https://stardog.example.org/sp2b_10k
For custom Java applications that use the Stardog HTTP client driver, the system property can be set programmatically or when the JVM is initialized.
The most common deployment approach requiring a custom trust store is
when a self-signed certificate is presented by the Stardog server. For
connections to succeed, the Stardog client must trust the self-signed
certificate. To accomplish this with the examples given above, the
self-signed certificate should be in the my-trusted-server.crt file in
the keytool invocation.
A client may also fail to authenticate to the server if the hostname in the Stardog database connection string does not match a name contained in the server certificate.[28]
This will cause an error message like the following
Error during connect. Cause was SSLException: hostname in certificate didn't match
The client driver does not support connecting when there’s a mismatch; therefore, the only workarounds are to replace the server’s certificate or modify the connection string to use an alias for the same server that matches the certificate.
OWL & Rule Reasoning
In this chapter we describe how to use Stardog’s reasoning capabilities; we address some common problems and known issues. We also describe Stardog’s approach to query answering with reasoning in some detail, as well as a set of guidelines that contribute to efficient query answering with reasoning. If you are not familiar with the terminology, you can peruse the section on terminology.
The semantics of Stardog’s reasoning is based in part on the OWL 2 Direct Semantics Entailment Regime. However, the implementation of Stardog’s reasoning system is worth understanding as well. Stardog performs reasoning in a lazy and late-binding fashion: it does not materialize inferences; but, rather, reasoning is performed at query time according to a user-specified "reasoning level". This approach allows for maximum flexibility[29] while maintaining excellent performance and scalability.
28. Reasoning Levels
Stardog supports several reasoning levels; the reasoning level determines the kinds of inference rules or axioms that are to be considered during query evaluation:
NONE
|
No axioms or rules are considered; no reasoning is performed. |
RDFS
|
For the OWL 2 axioms allowed in RDF schema (mainly subclasses, subproperties, domain, and ranges). |
QL
|
For OWL 2 QL axioms. |
RL
|
For OWL 2 RL axioms. |
EL
|
For OWL 2 EL axioms. |
DL
|
For OWL 2 DL axioms. |
SL
|
For a combination of RDFS, QL, RL, and EL axioms, plus SWRL rules. |
29. Using Reasoning
In order to perform query evaluation with reasoning, Stardog requires a schema[30] to be present in the database. Since schemas are serialized as RDF, they are loaded into a Stardog database in the same way that any RDF is loaded into a Stardog database. Also, note that, since the schema is just more RDF triples, it may change as needed: it is neither fixed nor compiled in any special way.
The schema may reside in the default graph, in a specific named graph,
or in a collection of graphs. You can tell Stardog where the schema is
by setting the reasoning.schema.graphs property to one or more named
graph URIs. If you want the default graph to be considered part of the
schema, then you can use the special built-in URI
tag:stardog:api:context:default. If you want to use all named graphs
(that is, to tell Stardog to look for the schema in every named graph),
you can use tag:stardog:api:context:all. The default value for this
property is to use the default graph only.
|
Note
|
A common source of confusion for new users is failing to (1) read this chapter, (2) realize that Stardog does not eagerly materialize inferences on data load, and (3) properly set the location (in terms of one or more named graps or the default graph) of the schema. |
This design is intended to support both of Stardog’s primary use cases:
-
managing the data that constitutes the schema
-
reasoning with the schema during query evaluation
29.1. Query Answering
All of Stardog’s interfaces (API, network, and CLI) support reasoning during query evaluation.
29.2. Command Line
In order to evaluate queries in Stardog using reasoning via the command line, a specific reasoning level must be specified in the connection string:
$ ./stardog query "myDB;reasoning=QL" "SELECT ?s { ?s a :C } LIMIT 10"
29.3. HTTP
For HTTP, the reasoning level is specified with the other HTTP request parameters:
$ curl -u admin:admin -X GET "http://localhost:5822/myDB/query?reasoning=ql&query=..."
29.4. Reasoning Connection API
In order to use the ReasoningConnection API one needs to specify a
reasoning level. See the Java Programming section for
details on specifying the reasoning level programmatically.
Currently, the API has two methods:
-
isConsistent(), which can be used to check if the database is (logically) consistent with respect to the reasoning level. -
isSatisfiable(URI theURIClass), which can be used to check if the given class if satisfiable with respect to the database and reasoning level.
30. Explaining Reasoning Results
Stardog can be used to check if the current datbase logically entails a set of triples; moreover, Stardog can explain why this is so.[31] An explanation of an inference is the minimum set of statements explicitly stored in the database that, together with the schema and any valid inferences, logically justify the inference. Explanations are useful for understanding data, schema, and their interactions, especially when large number of statements interact with each other to infer new statements.
Explanations can be retrieved using the CLI by providing an input file that contains the inferences to be explained:
$ stardog reasoning explain "myDB;reasoning=EL" inference_to_explain.ttl
The output is displayed in a concise syntax designed to be legible; but it can be rendered in any one of the supported RDF syntaxes if desired. Explanations are also accessible through the Stardog’s extended HTTP protocol and discussion of SNARL. See the examples included in the distribution for more details about retrieving explanations programmatically.
30.1. Proof Trees
Proof trees are a hierarchical presentation of multiple explanations (of inferences) to make data, schemas, and rules more intelligible. Proof trees[32] provide an explanation for an inference or an inconsistency as a hierarchical structure. Nodes in the proof tree may represent an assertion in a Stardog database. Multiple assertion nodes are grouped under an inferred node.
30.1.1. Example
For example, if we are explaining the inferred triple :Alice
rdf:type :Employee, the root of the proof tree will show that
inference:
INFERRED :Alice rdf:type :Employee
The children of an inferred node will provide more explanation for that inference:
INFERRED :Alice rdf:type :Employee
ASSERTED :Manager rdfs:subClassOf :Employee
INFERRED :Alice rdf:type :Manager
The fully expanded proof tree will show the asserted triples and axioms for every inference:
INFERRED :Alice rdf:type :Employee
ASSERTED :Manager rdfs:subClassOf :Employee
INFERRED :Alice rdf:type :Manager
ASSERTED :Alice :supervises :Bob
ASSERTED :supervises rdfs:domain :Manager
The CLI explanation command prints the proof tree using indented text; but, using the SNARL API, it is easy to create a tree widget in a GUI to show the explanation tree, such that users can expand and collapse details in the explanation.
Another feature of proof trees is the ability to merge multiple explanations into a single proof tree with multiple branches when explanations have common statements. Consider the following example database:
#schema
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:supervises rdfs:domain :Manager
:ResearchProject rdfs:subClassOf :Project
:projectID rdfs:domain :Project
# instance data
:Alice :supervises :Bob
:Alice :manages :ProjectX
:ProjectX a :ResearchProject
:ProjectX :projectID "123-45-6789"
In this database, there are three different unique explanations
for the inference :Alice rdf:type :Employee:
Explanation 1
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:supervises rdfs:domain :Manager
:Alice :supervises :Bob
Explanation 2
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:ResearchProject rdfs:subClassOf :Project
:Alice :manages :ProjectX
:ProjectX a :ResearchProject
Explanation 3
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:projectID rdfs:domain :Project
:Alice :manages :ProjectX
:ProjectX :projectID "123-45-6789"
All three explanations have some triples in common; but when explanations are retrieved separately, it is hard to see how these explanations are related. When explanations are merged, we get a single proof tree where alternatives for subtrees of the proof are shown inline. In indented text rendering, the merged tree for the above explanations would look as follows:
INFERRED :Alice a :Employee
ASSERTED :Manager rdfs:subClassOf :Employee
1.1) INFERRED :Alice a :Manager
ASSERTED :supervises rdfs:domain :Manager
ASSERTED :Alice :supervises :Bob
1.2) INFERRED :Alice a :Manager
ASSERTED :ProjectManager rdfs:subClassOf :Manager
INFERRED :Alice a :ProjectManager
ASSERTED :ProjectManager owl:equivalentClass (:manages some :Project)
ASSERTED :Alice :manages :ProjectX
2.1) INFERRED :ProjectX a :Project
ASSERTED :projectID rdfs:domain :Project
ASSERTED :ProjectX :projectID "123-45-6789"
2.2) INFERRED :ProjectX a :Project
ASSERTED :ResearchProject rdfs:subClassOf :Project
ASSERTED :ProjectX a :ResearchProject
In the merged proof tree, alternatives for an
explanation are shown with a number id. In the above tree,
:Alice a :Manager is the first inference for which we have
multiple explanations so it gets the id 1. Then each alternative
explanation gets an id appended to this (so explanations 1.1 and
1.2 are both alternative explanations for inference 1). We
also have multiple explanations for inference :ProjectX a :Project
so its alternatives get ids 2.1 and 2.2.
31. User-defined Rule Reasoning
Many reasoning problems may be solved with OWL’s axiom-based approach; but, of course, not all reasoning problems are amenable to this approach. A user-defined rules approach complements the OWL axiom-based approach nicely and increases the expressive power of a reasoning system from the user’s point of view. Many RDF databases support user-defined rules only. Stardog is the only RDF database that comprehensively supports both axioms and rules. Some problems (and some people) are simply a better fit for a rules-based approach to modeling and reasoning than to an axioms-based approach (and, of course, vice versa).
|
Note
|
There isn’t a one-size-fits-all answer to the question "rules or axioms or both?" Use the thing that makes the most sense given the task at hand. This is engineering, not religion. |
Stardog supports user-defined rule reasoning together with a rich set of built-in functions using the SWRL syntax and builtin-ins library. In order to apply SWRL user-defined rules, you must include the rules as part of the database’s schema: that is, put your rules where your axioms are, i.e., in the schema. Once the rules are part of the schema, they will be used for reasoning automatically when using the SL reasoning level.
Assertions implied by the rules will not be materialized. Instead, rules are used to expand queries just as regular axioms are used.
|
Note
|
To trigger rules to fire, execute a relevant query—simple and easy as the truth. |
31.1. Stardog Rules Syntax
Stardog supports two different syntaxes for defining rules. The first is native Stardog Rules syntax and is based on SPARQL, so you can re-use what you already know about SPARQL to write rules. Unless you have specific requirements otherwise, you should use this syntax for user-defined rules in Stardog. The second is the de facto standard RDF/XML syntax for SWRL. It has the advantage of being supported in many tools; but it’s not fun to read or to write. You probably don’t want to use it. Better: don’t use this syntax!
Stardog Rules Syntax is basically SPARQL "basic graph patterns" (BGPs)
plus some very explicit new bits (IF-THEN) to denote the head and the
body of a rule.[33] You define URI prefixes
in the normal way (examples below) and use regular SPARQL variables for
rule variables. As you can see, some SPARQL 1.1 syntactic
sugar—property paths, especially, but also bnode syntax—make complex
Stardog Rules concise and elegant.
31.1.1. How to Use Stardog Rules
There are three things to sort out:
-
Where to put these rules?
-
How to represent these rules?
-
What are the gotchas?
First, the rules go into the database, of course; and, in particular,
they go into the named graph in which Stardog expects to find the TBox.
This setting by default is the "default graph", i.e., unless
you’ve changed the value of reasoning.schema.graphs, you’re probably
going to be fine; that is, just add the rules to the database and it
will all work out.[34]
Second, you represent the rules with specially constructed RDF triples. Here’s a kind of template example:
@prefix rule: <tag:stardog:api:rule:> .
[] a rule:SPARQLRule;
rule:content """
...la di dah the rule goes here!
""".
So there’s a namespace--tag:stardog:api:rule:--that has a predicate,
content, and a class, SPARQLRule. The object of this triple contains
one rule in Stardog Rules syntax. A more realistic example:
@prefix rule: <tag:stardog:api:rule:> .
[] a rule:SPARQLRule ;
rule:content """
PREFIX :<urn:test:>
IF {
?r a :Rectangle ;
:width ?w ;
:height ?h
BIND (?w * ?h AS ?area)
}
THEN {
?r :area ?area
}""" .
That’s pretty easy. Third, what are the gotchas?
-
The RDF serialization of rules in, say, a Turtle file has to use the
tag:stardog:api:rule:namespace URI and then whatever prefix, if any, mechanism that’s valid for that serialization. In the examples here, we use Turtle. Hence, we use@prefix, etc. -
However, the namespace URIs used by the rules themselves can be defined in only two places: the string that contains the rule—in the example above, you can see the default namespace is
urn:test:--or in the Stardog database in which the rules are stored. Either place will work; if there are conflicts, the "closest definition wins", that is, iffoo:Exampleis defined in both the rule content and in the Stardog database, the definition in the rule content is the one that Stardog will use.
31.1.2. Stardog Rules Examples
PREFIX rule: <tag:stardog:api:rule:>
PREFIX : <urn:test:>
PREFIX gr: <http://purl.org/goodrelations/v1#>
:Product1 gr:hasPriceSpecification [ gr:hasCurrencyValue 100.0 ] .
:Product2 gr:hasPriceSpecification [ gr:hasCurrencyValue 500.0 ] .
:Product3 gr:hasPriceSpecification [ gr:hasCurrencyValue 2000.0 ] .
[] a rule:SPARQLRule ;
rule:content """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX gr: <http://purl.org/goodrelations/v1#>
PREFIX :<urn:test:>
IF {
?offering gr:hasPriceSpecification ?ps .
?ps gr:hasCurrencyValue ?price .
FILTER (?price >= 200.00).
}
THEN {
?offering a :ExpensiveProduct .
}
""".
This example is self-contained: it contains some data (the :Product…
triples) and a rule. It also demonstrates the use of SPARQL’s FILTER
to do numerical (and other) comparisons.
Here’s a more complex example that includes four rules and, again, some data.
PREFIX rule: <tag:stardog:api:rule:>
PREFIX : <urn:test:>
:c a :Circle ;
:radius 10 .
:t a :Triangle ;
:base 4 ;
:height 10 .
:r a :Rectangle ;
:width 5 ;
:height 8 .
:s a :Rectangle ;
:width 10 ;
:height 10 .
[] a rule:SPARQLRule ;
rule:content """
PREFIX :<urn:test:>
IF {
?r a :Rectangle ;
:width ?w ;
:height ?h
BIND (?w * ?h AS ?area)
}
THEN {
?r :area ?area
}""" .
[] a rule:SPARQLRule ;
rule:content """
PREFIX :<urn:test:>
IF {
?t a :Triangle ;
:base ?b ;
:height ?h
BIND (?b * ?h / 2 AS ?area)
}
THEN {
?t :area ?area
}""" .
[] a rule:SPARQLRule ;
rule:content """
PREFIX :<urn:test:>
PREFIX math: <http://www.w3.org/2005/xpath-functions/math#>
IF {
?c a :Circle ;
:radius ?r
BIND (math:pi() * math:pow(?r, 2) AS ?area)
}
THEN {
?c :area ?area
}""" .
[] a rule:SPARQLRule ;
rule:content """
PREFIX :<urn:test:>
IF {
?r a :Rectangle ;
:width ?w ;
:height ?h
FILTER (?w = ?h)
}
THEN {
?r a :Square
}""" .
This example also demonstrates how to use SPARQL’s BIND to introduce
intermediate variables and do calculations with or to them.
Let’s look at some other rules, but just the rule content this time for concision, to see some use of other SPARQL features.
This rule says that a person between 13 and 19 (inclusive) years of age is a teenager:
PREFIX swrlb: <http://www.w3.org/2003/11/swrlb#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
IF {
?x a :Person; hasAge ?age.
FILTER (?age >= 13 && ?age <= 19)
}
THEN {
?x a :Teenager.
}
This rule says that a male person with a sibling who is the parent of a female is an "uncle with a niece":
IF {
$x a Person; a :Male; :hasSibling $y;
$y :isParentOf $z;
$z a :Female.
}
THEN {
$x a :UncleOfNiece.
}
We can use SPARQL 1.1 property paths (and bnodes for unnecessary
variables (that is, ones that aren’t used in the THEN)) to render this
rule even more concisely:
IF {
$x a :Person, :Male; :hasSibling/:isParentOf [a :Female]
}
THEN {
$x a :UncleOfNiece.
}
Aside: that’s pure awesome.
And of course a person who’s male and has a niece or nephew is an uncle of his niece(s) and nephew(s):
IF {
?x a :Male; :isSiblingOf/:isParentOf ?z
}
THEN {
?x :isUncleOf ?z.
}
Next rule example: a super user can read all of the things!
IF {
?x a :SuperUser.
?y a :Resource.
?z a <http://www.w3.org/ns/sparql#UUID>.
}
THEN {
?z a :Role.
?x :hasRole ?z; :readPermission ?y.
}
31.2. Supported Built-Ins
Stardog supports a wide variety of functions from SPARQL, XPath, SWRL, and some native Stardog functions, too. All of them may be used in either Stardog Rules syntax or in SWRL syntax. The supported functions are enumerated here.
32. Special Predicates
Stardog supports some builtin predicates with special meaning in order to make queries and rules easier to read and write. These special predicates are primarily syntactic sugar for more complex structures.
32.1. Direct/Strict Subclasses, Subproperties, & Direct Types
Besides the standard RDF(S) predicates rdf:type, rdfs:subClassOf and
rdfs:subPropertyOf, Stardog supports the following special built-in
predicates:
-
sp:directType -
sp:directSubClassOf -
sp:strictSubClassOf -
sp:directSubPropertyOf -
sp:strictSubPropertyOf
Where the sp prefix binds to tag:stardog:api:property:. Stardog also
recognizes sesame:directType, sesame:directSubClassOf, and
sesame:strictSubClassOf predicates where the prefix sesame binds to
http://www.openrdf.org/schema/sesame#.
We show what these each of these predicates means by relating them to an equivalent triple pattern; that is, you can just write the predicate rather than the (more unwieldy) triple pattern.
#c1 is a subclass of c2 but not equivalent to c2
:c1 sp:strictSubClassOf :c2 => :c1 rdfs:subClassOf :c2 .
FILTER NOT EXISTS {
:c1 owl:equivalentClass :c2 .
}
#c1 is a strict subclass of c2 and there is no c3 between c1 and c2 in
#the strict subclass hierarchy
:c1 sp:directSubClassOf :c2 => :c1 sp:strictSubClassOf :c2 .
FILTER NOT EXISTS {
:c1 sp:strictSubClassOf :c3 .
:c3 sp:strictSubClassOf :c2 .
}
#ind is an instance of c1 but not an instance of any strict subclass of c1
:ind sp:directType :c1 => :ind rdf:type :c1 .
FILTER NOT EXISTS {
:ind rdf:type :c2 .
:c2 sp:strictSubClassOf :c1 .
}
The predicates sp:directSubPropertyOf and sp:strictSubPropertyOf are defined analogously.
32.2. New Individuals with SWRL
Stardog also supports a special predicate that extends the expressivity of SWRL rules. According to SWRL, you can’t create new individuals (i.e., new instances of classes) in a SWRL rule.
|
Note
|
Don’t get hung up by the tech vocabulary here…"new individual" just means that you can’t have a rule that creates a new instance of some RDF or OWL class as a result of the rule firing. |
This restriction is well-motivated; without it, you can easily create rules that do not terminate, that is, never reach a fixed point. Stardog’s user-defined rules weakens this restriction in some crucial aspects, subject to the following restrictions, conditions, and warnings.
|
Warning
|
This special predicate is basically a loaded gun with which you may shoot yourselves in the foot if you aren’t very careful. |
So despite the general restriction in SWRL, in Stardog we actually can
create new individuals with a rule by using the function UUID() as
follows:
IF {
?p a :Parent .
BIND (UUID() AS ?parent) .
}
THEN {
?parent a :Person .
}
|
Note
|
Alternatively, we can use the predicate
http://www.w3.org/ns/sparql#UUID as a unary SWRL built-in.
|
This rule will create a random URI for each instance of the class
:Parent and also assert that each new instance is an instance of
:Person--parents are people, too!
32.2.1. Remarks
-
The URIs for the generated individuals are meaningless in the sense that they should not be used in further queries; that is to say, these URIs are not guaranteed by Stardog to be stable.
-
Due to normalization, rules with more than one atom in the head are broken up into several rules.
Thus,
IF {
?person a :Person .
BIND (UUID() AS ?parent) .
}
THEN {
?parent a :Parent ;
a :Male .
}
will be normalized into two rules:
IF {
?person a :Person .
BIND (UUID() AS ?parent) .
}
THEN {
?parent a :Parent .
}
IF {
?person a :Person .
BIND (UUID() AS ?parent) .
}
THEN {
?parent a :Male .
}
As a consequence, instead of stating that the new individual is both an instance
of :Male and :Parent, we would create two different new individuals and
assert that one is male and the other is a parent. If you need to assert various
things about the new individual, we recommend the use of extra rules or axioms. In
the previous example, we can introduce a new class (:Father) and add the
following rule to our schema:
IF {
?person a :Father .
}
THEN {
?parent a :Parent ;
a :Male .
}
And then modify the original rule accordingly:
IF {
?person a :Person .
BIND (UUID() AS ?parent) .
}
THEN {
?parent a :Father .
}
33. Query Rewriting
Reasoning in Stardog is based (mostly) on a query rewriting technique: Stardog rewrites the user’s query with respect to any schema or rules, and then executes the resulting expanded query (EQ) against the data in the normal way. This process is completely automated and requires no intervention from the user.
As can be seen in Figure 1, the rewriting process involves five different phases.
We illustrate the query answering process by means of an example. Consider a Stardog database, MyDB1, containing the following schema:
:SeniorManager rdfs:subClassOf :manages some :Manager
:manages some :Employee rdfs:subClassOf :Manager
:Manager rdfs:subClassOf :Employee
Which says that a senior manager manages at least one manager, that every person that manages an employee is a manager, and that every manager is also an employee.
Let’s also assume that MyDB1 contains the following data assertions:
:Bill rdf:type :SeniorManager
:Robert rdf:type :Manager
:Ana :manages :Lucy
:Lucy rdf:type :Employee
Finally, let’s say that we want to retrieve the set of all employees. We do this by posing the following query:
SELECT ?employee WHERE { ?employee rdf:type :Employee }
To answer this query, Stardog first rewrites it using the information in the schema. So the original query is rewritten into four queries:
SELECT ?employee WHERE { ?employee rdf:type :Employee }
SELECT ?employee WHERE { ?employee rdf:type :Manager }
SELECT ?employee WHERE { ?employee rdf:type :SeniorManager }
SELECT ?employee WHERE { ?employee :manages ?x. ?x rdf:type :Employee }
Then Stardog executes these queries over the data as if they were written that way to begin with. In fact, Stardog can’t tell that they weren’t. Reasoning in Stardog just is query answering in nearly every case.
The form of the EQ depends on the reasoning level. For OWL 2 QL, every EQ produced by Stardog is guaranteed to be expanded into a set of queries. If the reasoning level is OWL 2 RL or EL, then the EQ may (but may not) include a recursive rule. If a recursive rule is included, Stardog’s answers will be sound but incomplete with respect to the semantics of the reasoning level.
33.1. Why Query Rewriting?
Query rewriting has several advantages over materialization. In materialization, the data gets expanded with respect to the schema, not with respect to any actual query. And it’s the data—all of the data—that gets expanded, whether any actual query subsequently requires reasoning or not. The schema is used to generate new triples, typicaly when data is added or removed from the system. However, materialization introduces several thorny issues:
-
data freshness. Materialization has to be performed every time the data or the schema change. This is particularly unsuitable for applications where the data changes frequently.
-
data size. Depending on the schema, materialization can significantly increase the size of the data, sometimes dramatically so. The cost of this data size blowup may be applied to every query in terms of increased I/O.
-
OWL 2 profile reasoning. Given the fact that QL, RL, and EL are not comparable with respect to expressive power, an application that requires reasoning with more than one profile would need to maintain different corresponding materialized versions of the data.
-
Resources. Depending on the size of the original data and the complexity of the schema, materialization may be computationally expensive. And truth maintenance, which materialization requires, is always computationally expensive.
34. Performance Hints
The query rewriting approach suggests some guidelines for more efficient query answering.
34.1. Hierarchies and Queries
- Avoid unnecessarily deep class/property hierarchies.
-
If you do not need to model several different types of a given class or property in your schema, then don’t do that! The reason shallow hierarchies are desirable is that the maximal hierarchy depth in the schema partly determines the maximal size of the EQs produced by Stardog. The larger the EQ, the longer it takes to evaluate, generally.
For example, suppose our schema contains a very thorough and detailed set of subclasses of the class
:Employee::Manager rdfs:subClassOf :Employee :SeniorManager rdfs:subClassOf :Manager ... :Supervisor rdfs:subClassOf :Employee :DepartmentSupervisor rdfs:subClassOf :Supervisor ... :Secretary rdfs:subClassOf :Employee ...If we wanted to retrieve the set of all employees, Stardog would produce an EQ containing a query of the following form for every subclass
:Ciof:Employee:SELECT ?employee WHERE { ?employee rdf:type :Ci }Thus, ask the most specific query sufficient for your use case. Why? More general queries—that is, queries that contain concepts high up in the class hierarchy defined by the schema—will typically yield larger EQs.
34.2. Domains and Ranges
- Specify domain and range of the properties in the schema.
-
These types of axiom can improve query performance significantly. Consider the following query asking for people and the employees they manage:
SELECT ?manager ?employee WHERE { ?manager :manages ?employee. ?employee rdf:type :Employee. }We know that this query would cause a large EQ given a deep hierarchy of
:Employeesubclasses. However, if we added the following single range axiom::manages rdfs:range :Employeethen the EQ would collapse to
SELECT ?manager ?employee WHERE { ?manager :manages ?employee }which is considerably easier to evaluate.
35. Not Seeing Expected Results?
Here’s a few things that you might want to consider.
35.1. Are variable types ambiguous?
When a SPARQL query gets executed, each variable is bound to a URI, blank node, or to a literal to form a particular result (a collection of these results is a result set). In the context of reasoning, URIs might represent different entities: individuals, classes, properties, etc. According to the relevant standard, every variable in a SPARQL query must bind to at most one of these types of entity.
Stardog can often figure out the right entity type from the query itself
(e.g., given the triple pattern ?i ?p "a literal", we know ?p is
supposed to bind to a data property); however, sometimes this isn’t
possible (e.g., ?s ?p ?o). In case the types can’t be determined
automatically, Stardog logs a message and evaluates the query without
any reasoning.
|
Note
|
This bears repeating since it’s a frequently asked question: If Stardog cannot determine the types of variables in a query for which reasoning is requested, Stardog will log a message and evaluate the query without reasoning. |
You can add one or more type triples to the query to resolve these ambiguities.[35]
These "type triples" have the form ?var a TYPE, where TYPE is a URI
representing the type of entity to which the variable ?var is supposed
to bind: the most common are owl:ObjectProperty or
owl:DatatypeProperty; in some cases, you might want
owl:NamedIndividual, or owl:Class. For instance, if you are
interested in all the object properties of :i1, you can use the
following query:
SELECT ?o
WHERE {
:i1 ?p ?o.
?p a owl:ObjectProperty.
}.
Since Stardog now knows that ?p should bind to an object property, we can now
infer that ?o binds to an individual, so there are no ambiguities and
reasoning can be performed as requested.
35.2. Is the schema where you think it is?
Stardog might be extracting the wrong schema. You have to tell Stardog where to find the schema. See database configuration options for details.
35.3. Are you using the right reasoning level?
Perhaps some of the modeling constructs (a.k.a. axioms) in your database are
being ignored. You can find out which axioms are being ignored by including the
following line in the logging.properties file in STARDOG_HOME:
com.clarkparsia.blackout.level = ALL
35.4. Are you using DL?
Stardog supports schema-only reasoning for OWL 2 DL, which effectively means that only TBox queries—queries that contain TBox BGPs only—will return complete query results.
35.5. Are you using SWRL?
As from version 2.0, SWRL rules are only taken into account using the SL reasoning level.
35.6. Do you know what to expect?
The OWL 2 primer is a good place to start.
36. Known Issues
Stardog 2.2.4 does not
-
Follow ontology
owl:importsstatements automatically; any imported OWL ontologies that are required must be loaded into a Stardog database in the normal way. -
Handle recursive queries. If recursion is necessary to answer the query with respect to the schema, results will be sound (no wrong answers) but potentially incomplete (some correct answers not returned) with respect to the requested reasoning type.
-
Perform equality reasoning. Only explicit
owl:sameAsandowl:differentFromdata assertions will be taken into account for query answering.[36] -
Perform datatype reasoning or respect user-defined datatypes.
37. Terminology
This chapter uses the following terms of art.
37.1. Databases
A database (DB), a.k.a. ontology, is composed of two different parts: the schema or Terminological Box (TBox) and the data or Assertional Box (ABox). Analogus to relational databases, the TBox can be thought of as the schema, and the ABox as the data. In other words, the TBox is a set of axioms, whereas the ABox is a set of assertions.
As we explain in OWL 2 Profiles, the kinds of assertion and axiom that one might use for a particular database are determined by the fragment of OWL 2 to which you’d like to adhere. In general, you should choose the OWL 2 profile that most closely fits the data modeling needs of your application.
The most common data assertions are class and property assertions. Class assertions are used to state that a particular individual is an instance of a given class. Property assertions are used to state that two particular individuals (or an individual and a literal) are related via a given property. For example, suppose we have a DB MyDB2 that contains the following data assertions. We use the usual standard prefixes for RDF(S) and OWL.
:clark_and_parsia rdf:type :Company
:clark_and_parsia :maintains :Stardog
Which says that :clark_and_parsia is a company, and that
:clark_and_parsia maintains :Stardog.
The most common schema axioms are subclass axioms. Subclass axioms are used to state that every instance of a particular class is also an instance of another class. For example, suppose that MyDB2 contains the following TBox axiom:
:Company rdfs:subClassOf :Organization
stating that companies are a type of organization.
37.2. Queries
When reasoning is enabled, Stardog executes SPARQL queries depending on the type of Basic Graph Patterns they contain. A BGP is said to be an "ABox BGP" if it is of one of the following forms:
-
term1
rdf:typeuri -
term1 uri term2
-
term1
owl:differentFromterm2 -
term1
owl:sameAsterm2
A BGP is said to be a TBox BGP if it is of one of the following forms:
-
term1
rdfs:subClassOfterm2 -
term1
owl:disjointWithterm2 -
term1
owl:equivalentClassterm2 -
term1
rdfs:subPropertyOfterm2 -
term1
owl:equivalentPropertyterm2 -
term1
owl:inverseOfterm2 -
term1
owl:propertyDisjointWithterm2 -
term1
rdfs:domainterm2 -
term1
rdfs:rangeterm2
A BGP is said to be a Hybrid BGP if it is of one of the following forms:
-
term1
rdf:type?var -
term1 ?var term2
where term (possibly with subscripts) is either an URI or variable; uri is a URI; and ?var is a variable.
When executing a query, ABox BGPs are handled by Stardog. TBox BGPs are executed by Pellet embedded in Stardog. Hybrid BGPs by a combination of both.
37.3. Reasoning
Intuitively, reasoning with a DB means to make implicit knowledge explicit. There are two main use cases for reasoning: to infer implicit knowledge and to discover modeling errors.
With respect to the first use case, recall that MyDB2 contains the following assertion and axiom:
:clark_and_parsia rdf:type :Company
:Company rdfs:subClassOf :Organization
From this DB, we can use Stardog in order to infer that
:clark_and_parsia is an organization:
:clark_and_parsia rdf:type :Organization
Using reasoning in order to infer implicit knowledge in the context of an enterprise application can lead to simpler queries. Let us suppose, for example, that MyDB2 contains a complex class hierarchy including several types of organization (including company). Let us further suppose that our application requires to use Stardog in order to get the list of all considered organizations. If Stardog were used with reasoning, then we would need only issue the following simple query:
SELECT ?org WHERE { ?org rdf:type :Organization}
In contrast, if we were using Stardog with no reasoning, then we would have to issue a more complex query that considers all possible types of organization, thus coupling queries to domain knowledge in a tight way:
SELECT ?org WHERE
{ { ?org rdf:type :Organization } UNION
{ ?org rdf:type :Company } UNION
...
}
Which of these queries seems more loosely coupled and more resilient to change?
Stardog can also be used in order to discover modeling errors in a DB. The most common modeling errors are unsatisfiable classes and inconsistent DBs.
An unsatisfiable class is simply a class that cannot have any instances. Say, for example, that we added the following axioms to MyDB2:
:Company owl:disjointWith :Organization
:LLC owl:equivalentClass :Company and :Organization
stating that companies cannot be organizations and vice versa, and that
an LLC is a company and an organization. The disjointness axiom causes
the class :LLC to be unsatisfiable because, for the DB to be
free of any logical contradiction, there can be no instances of :LLC.
Asserting (or inferring) that an unsatisfiable class has an instance, causes the
DB to be inconsistent. In the particular case of MyDB2, we know that
:clark_and_parsia is a company and an organization; therefore, we also know
that it is an instance of :LLC, and as :LLC is known to be unsatisfiable, we
have that MyDB2 is inconsistent.
Using reasoning in order to discover modeling errors in the context of
an enterprise application is useful in order to maintain a correct
contradiction-free model of the domain. In our example, we discovered
that :LLC is unsatisfiable and MyDB2 is inconsistent, which leads us
to believe that there is a modeling error in our DB. In this case, it is
easy to see that the problem is the disjointness axiom between
:Company and :Organization.
37.4. OWL 2 Profiles
As explained in the OWL 2 Web Ontology Language Profiles Specification, an OWL 2 profile is a reduced version of OWL 2 that trades some expressive power for efficiency of reasoning. There are three OWL 2 profiles, each of which achieves efficiency differently.
-
OWL 2 QL is aimed at applications that use very large volumes of instance data, and where query answering is the most important reasoning task. The expressive power of the profile is necessarily limited; however, it includes most of the main features of conceptual models such as UML class diagrams and ER diagrams.
-
OWL 2 EL is particularly useful in applications employing ontologies that contain very large numbers of properties and/or classes. This profile captures the expressive power used by many such ontologies and is a subset of OWL 2 for which the basic reasoning problems can be performed in time that is polynomial with respect to the size of the ontology.
-
OWL 2 RL is aimed at applications that require scalable reasoning without sacrificing too much expressive power. It is designed to accommodate OWL 2 applications that can trade the full expressivity of the language for efficiency, as well as RDF(S) applications that need some added expressivity.
Each profile restricts the kinds of axiom and assertion that can be used in a DB. Colloquially, QL is the least expressive of the profiles, followed by RL and EL; however, strictly speaking, no profile is more expressive than any other as they provide incomparable sets of constructs.
Stardog supports the three profiles of OWL 2. Notably, since TBox BGPs are handled completely by Pellet, Stardog supports reasoning for the whole of OWL 2 for queries containing TBox BGPs only.
Validating Constraints
Stardog Integrity Constraint Validation ("ICV") validates RDF data stored in a Stardog database according to data rules (i.e., "constraints") described by users and that make sense for their domain, application, and data. These constraints may be written in SPARQL, OWL, or SWRL. This chapter explains how to use ICV.
The use of high-level languages (OWL 2, SWRL, and SPARQL) to validate RDF data using closed world semantics is one of Stardog’s unique capabilities. Using high level languages like OWL, SWRL, and SPARQL as schema or constraint languages for RDF and Linked Data has several advantages:
-
Unifying the domain model with data quality rules
-
Aligning the domain model and data quality rules with the integration model and language (i.e., RDF)
-
Being able to query the domain model, data quality rules, integration model, mapping rules, etc with SPARQL
-
Being able to use automated reasoning about all of these things to insure logical consistency, explain errors and problems, etc.
If you are also interested in theory and background, please see the ICV specification, which has all the formal details.
38. Getting Started with ICV
This log of a CLI session gives a full example of how to validate data using a mix of integrity constraints expressed in OWL and SPARQL. It uses data and constraints linked below.
# Stardog commands and the output for RDF validation example
# First create the Stardog database and load data
$ ./stardog-admin db create -n sota sota-data.ttl
Bulk loading data to new database.
Loading data completed...Loaded 23 triples in 00:00:00 @ 0.4K triples/sec.
Successfully created database 'sota'.
# Then add the constraints to the database
$ ./stardog-admin icv add sota sota-constraints.ttl
Successfully added constraints in 00:00:00.
# Now run the validation command
# This command just prints which constraints are violated, see the Java
# example for printing the details about validation
$ ./stardog icv validate sota
Data is NOT valid.
The following constraints were violated:
AxiomConstraint{:related rdfs:range :Issue}
AxiomConstraint{:reportedOn rdfs:domain :Issue}
AxiomConstraint{:Issue rdfs:subClassOf (:reportedBy exactly 1 owl:Thing)}
AxiomConstraint{:Issue rdfs:subClassOf (:reproducedBy min 0 owl:Thing)}
AxiomConstraint{:reproducedBy rdfs:range foaf:Person}
AxiomConstraint{:reportedBy rdfs:range foaf:Person}
AxiomConstraint{:Issue rdfs:subClassOf (:related min 0 owl:Thing)}
AxiomConstraint{:state rdfs:domain :Issue}
AxiomConstraint{:state rdfs:range :ValidState}
AxiomConstraint{:Issue rdfs:subClassOf (:reproducedOn min 0 rdfs:Literal)}
# We can also add SPARQL queries as constraints
$ ./stardog-admin icv add sota-query.sparql
# We can run validation with a mixture of OWL constraints and SPARQL constraints
$ ./stardog icv validate sota
Data is NOT valid.
...
See the following Gists to follow along at home:
And, finally, the full Gist with links to everything. In the rest of this chapter, we explain in more detail about programmatic access, as well as give a full slate of examples of ICV in action.
39. ICV & OWL 2 Reasoning
An integrity constraint may be satisfied or violated in either of two ways: by an explicit statement in a Stardog database or by a statement that’s been validly inferred by Stardog.
When ICV is enabled for a Stardog database, it has to be enabled with respect to a reasoning type or level. The valid choices of reasoning type are any type or kind of reasoning supported by Stardog. See Stardog’s reasoning & inference chapter for the details.
So ICV is performed with three inputs:
-
a Stardog database,
-
a set of constraints, and
-
a reasoning type (which may be, of course, no reasoning).
This is the case because domain modelers, ontology developers, or integrity constraint authors must consider the interactions between explicit and inferred statements and how these are accounted for in integrity constraints.
40. Using ICV from CLI
To add constraints to a database:
$ stardog-admin icv add myDb constraints.rdf
To drop all constraints from a database:
$ stardog-admin icv drop myDb
To remove one or more specific constraints from a database:
$ stardog-admin icv remove myDb constraints.rdf
To convert new or existing constraints into SPARQL queries for export:
$ stardog icv convert myDb out.rdf
To explain a constraint violation:
$ stardog explain --contexts http://example.org/context1 http://example.org/context2
To export constraints:
$ stardog icv export myDb constraints.rdf
To validate a database (or some named graphs) with respect to constraints:
$ stardog validate --contexts http://example.org/context1 http://example.org/context2
41. ICV Guard Mode
Stardog will also apply constraints as part of its transactional cycle and fail transactions that violate constraints. We call this "guard mode". It must be enabled explicitly in the database configuration options. Using the command line, these steps are as follows:
$ ./stardog-admin db offline --timeout 0 myDb #take the database offline
$ ./stardog-admin db metadata set icv.enabled=true myDb #enable ICV
$ ./stardog-admin db online myDb #put the database online
In the Web Console you can set the database offline, click Edit, change the "ICV Enable" value, click Save and set the database online again.
Once guard mode is enabled, modifications of the database (via SPARQL Update or any other method), whether adds or deletes, that violate the integrity constraints will cause the transaction to fail.
42. Explaining ICV Violations
ICV violations can be explained using Stardog’s Proof Trees. The following command will explain the IC violations for constraints stored in the database:
$ stardog icv explain --reasoning EL "myDB"
It is possible to explain violations for external constraints by passing the file with constraints as an additional argument:
$ stardog icv explain --reasoning EL "myDB" constraints.ttl
42.1. Security Note
|
Warning
|
There is a security implication in this design choice that may not be obvious. Changing the reasoning type associated with a database and integrity constraint validation may have serious security implications with respect to a Stardog database and, thus, may only be performed by a user role with sufficient privileges for that action. |
43. ICV Examples
Stardog ICV has a formal semantics. But let’s
just look at some examples instead; these examples use OWL 2 Manchester syntax,
and they assume a simple data schema, which is available as an
OWL ontology and as a UML
diagram. The examples assume that the default namespace is
http://example.com/company.owl# and that xsd: is bound to the standard,
http://www.w3.org/2001/XMLSchema#.
Reference Java code is available for each of the following examples and is also distributed with Stardog.
43.1. Subsumption Constraints
This kind of constraint guarantees certain subclass and superclass (i.e., subsumption) relationships exist between instances.
43.1.1. Managers must be employees.
Database B (valid)
:Alice a :Manager , :Employee .
This constraint says that if an RDF individual is an instance of
Manager, then it must also be an instance of Employee. In
A, the only instance of Manager, namely Alice, is not an instance of
Employee; therefore, A is invalid. In B, Alice is an instance of
Database both Manager and Employee; therefore, B is valid.
43.2. Domain-Range Constraints
These constraints control the types of domain and range instances of properties.
43.2.1. Only project leaders can be responsible for projects.
Database C (valid)
:Alice a :Project_Leader ;
:is_responsible_for :MyProject .
:MyProject a :Project .
This constraint says that if two RDF instances are related to each other via the
property is_responsible_for, then the range instance must be an instance of
Project_Leader and the domain instance must be an instance of Project. In
Database A, there is only one pair of individuals related via
is_responsible_for, namely (Alice, MyProject), and MyProject is an
instance of Project; but Alice is not an instance of Project_Leader.
Therefore, A is invalid. In B, Alice is an instance of Project_Leader, but
MyProject is not an instance of Project; therefore, B is not valid. In C,
Alice is an instance of Project_Leader, and MyProject is an instance of
Project; therefore, C is valid.
43.2.2. Only employees can have an SSN.
Database B (valid)
:Bob a :Employee ;
:ssn "123-45-6789" .
This constraint says that if an RDF instance i has a data assertion via the
the property SSN, then i must be an instance of Employee. In A, Bob is
not an instance of Employee but has SSN; therefore, A is invalid. In B,
Bob is an instance of Employee; therefore, B is valid.
43.2.3. A date of birth must be a date.
Database B (valid)
:Bob :dob "1970-01-01"^^xsd:date
This constraint says that if an RDF instance i is related to a literal
l via the data property DOB, then l must have the XML Schema type
xsd:date. In A, Bob is related to the untyped literal
"1970-01-01" via DOB so A is invalid. In B, the literal
"1970-01-01" is properly typed so it’s valid.
43.3. Participation Constraints
These constraints control whether or not an RDF instance participates in some specified relationship.
43.3.1. Each supervisor must supervise at least one employee.
Constraint
#this constraint is very concise in Terp syntax:
#:Supervisor rdfs:subClassOf (:supervises some :Employee)
:Supervisor rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :supervises ;
owl:someValuesFrom :Employee
] .
Database D (valid)
:Alice a :Supervisor ;
:supervises :Bob .
:Bob a :Employee
This constraint says that if an RDF instance i is of type Supervisor, then
i must be related to an individual j via the property supervises and also
that j must be an instance of Employee. In A, Supervisor has no instances;
therefore, A is trivially valid. In B, the only instance of Supervisor, namely
Alice, is related to no individual; therefore, B is invalid. In C, Alice is
related to Bob via supervises, but Bob is not an instance of Employee;
therefore, C is invalid. In D, Alice is related to Bob via supervises, and
Bob is an instance of Employee; hence, D is valid.
43.3.2. Each project must have a valid project number.
Constraint
#Again, this constraint in Terp syntax rocks the hizzous:
#:Project rdfs:subClassOf (:number some xsd:integer[>= 0, < 5000])
:Project rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :number ;
owl:someValuesFrom
[ a rdfs:Datatype ;
owl:onDatatype xsd:integer ;
owl:withRestrictions ([xsd:minInclusive 0] [ xsd:maxExclusive 5000])
]
] .
Database E (valid)
:MyProject a :Project ;
:number "23"^^xsd:integer .
This constraint says that if an RDF instance i is of type Project, then i
must be related via the property number to an integer between 0 and 5000
(inclusive)—that is, projects have project numbers in a certain range. In A,
the individual MyProject is not known to be an instance of Project so the
constraint does not apply at all and A is valid. In B, MyProject is an
instance of Project but doesn’t have any data assertions via number so A is
invalid. In C, MyProject does have a data property assertion via number but
the literal "23" is untyped—that is, it’s not an integer—therefore, C is
invalid. In D, MyProject is related to an integer via number but it is out
of the range: D is invalid. Finally, in E, MyProject is related to the integer
23 which is in the range of [0,5000] so E is valid.
43.4. Cardinality Constraints
These constraints control the number of various relationships or property values.
43.4.1. Employees must not work on more than 3 projects.
Constraint
#Constraint in Terp syntax:
#:Employee rdfs:subClassOf (:works_on max 3 :Project)
:Employee rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :works_on;
owl:maxQualifiedCardinality "3"^^xsd:nonNegativeInteger ;
owl:onClass :Project
] .
Database C (invalid)
:Bob a :Employee ;
:works_on :MyProject , :MyProjectFoo , :MyProjectBar , :MyProjectBaz .
:MyProject a :Project .
:MyProjectFoo a :Project .
:MyProjectBar a :Project .
:MyProjectBaz a :Project .
If an RDF instance i is an Employee, then i must not be related via the
property works_on to more than 3 instances of Project. In A, Bob is not
known to be an instance of Employee so the constraint does not apply and the A
is valid. In B, Bob is an instance of Employee but is known to work on only
a single project, namely MyProject, so B is valid. In C, Bob is related to 4
instances of Project via works_on.
|
Note
|
Stardog ICV implements a weak form of the unique name assumption, that is, it assumes that things which have different names are, in fact, different things.[37] |
Since Stardog ICV uses closed world (instead of open world) semantics,[38] it assumes that the different projects with different names are, in fact, separate projects, which (in this case) violates the constraint and makes C invalid.
43.4.2. Departments must have at least 2 employees.
Constraint
#Constraint in Terp syntax:
#:Department rdfs:subClassOf (inverse :works_in min 2 :Employee)
:Department rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty [owl:inverseOf :works_in] ;
owl:minQualifiedCardinality "2"^^xsd:nonNegativeInteger ;
owl:onClass :Employee
] .
Database C (valid)
[source,sparql] :MyDepartment a :Department . :Alice a :Employee ; :works_in :MyDepartment . :Bob a :Employee ; :works_in :MyDepartment .
This constraint says that if an RDF instance i is a Department, then there
should exist at least 2 instances j and k of class Employee which are
related to i via the property works_in (or, equivalently, i should be
related to them via the inverse of works_in). In A, MyDepartment is not
known to be an instance of Department so the constraint does not apply. In B,
MyDepartment is an instance of Department but only one instance of
Employee, namely Bob, is known to work in it, so B is invalid. In C,
MyDepartment is related to the individuals Bob and Alice, which are both
instances of Employee and (again, due to weak Unique Name Assumption in
Stardog ICV), are assumed to be distinct, so C is valid.
43.4.3. Managers must manage exactly 1 department.
Constraint
#Constraint in Terp syntax:
#:Manager rdfs:subClassOf (:manages exactly 1 :Department)
:Manager rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :manages ;
owl:qualifiedCardinality "1"^^xsd:nonNegativeInteger ;
owl:onClass :Department
] .
Database E (invalid)
:Isabella a :Manager ;
:manages :MyDepartment , :MyDepartment1 .
:MyDepartment a :Department .
:MyDepartment1 a :Department .
This constraint says that if an RDF instance i is a Manager, then it must be
related to exactly 1 instance of Department via the property manages. In A,
the individual Isabella is not known to be an instance of Manager so the
constraint does not apply and A is valid. In B, Isabella is an instance of
Manager but is not related to any instances of Department, so B is invalid.
In C, Isabella is related to the individual MyDepartment via the property
manages but MyDepartment is not known to be an instance of Department, so
C is invalid. In D, Isabella is related to exactly one instance of
Department, namely MyDepartment, so D is valid. Finally, in E, Isabella is
related to two (assumed to be) distinct (again, because of weak UNA) instances
of Department, namely MyDepartment and MyDepartment1, so E is invalid.
43.4.4. Entities may have no more than one name.
Database C (invalid)
:MyDepartment :name "Human Resources" , "Legal" .
This constraint says that no RDF instance i can have more than one assertion via
the data property name. In A, MyDepartment does not have any data property
assertions so A is valid. In B, MyDepartment has a single assertion via
name, so the ontology is also valid. In C, MyDepartment is related to 2
literals, namely "Human Resources" and "Legal", via name, so C is invalid.
43.5. Property Constraints
These constraints control how instances are related to one another via properties.
43.5.1. The manager of a department must work in that department.
Database B (valid)
:Bob :works_in :MyDepartment ;
:manages :MyDepartment .
This constraint says that if an RDF instance i is related to j via the
property manages, then i must also be related to j va the property
works_in. In A, Bob is related to MyDepartment via manages, but not via
works_in, so A is invalid. In B, Bob is related to MyDepartment via both
manages and works_in, so B is valid.
43.5.2. Department managers must supervise all the department’s employees.
Database A (invalid)
:Jose :manages :MyDepartment ;
:is_supervisor_of :Maria .
:Maria :works_in :MyDepartment .
:Diego :works_in :MyDepartment .
Database B (valid)
:Jose :manages :MyDepartment ;
:is_supervisor_of :Maria , :Diego .
:Maria :works_in :MyDepartment .
:Diego :works_in :MyDepartment .
This constraint says that if an RDF instance i is related to j via the
property manages and k is related to j via the property works_in, then
i must be related to k via the property is_supervisor_of. In A, Jose is
related to MyDepartment via manages, Diego is related to MyDepartment
via works_in, but Jose is not related to Diego via any property, so A is
invalid. In B, Jose is related to Maria and Diego--who are both related to
MyDepartment by way of works_in--via the property is_supervisor_of, so B
is valid.
43.6. Complex Constraints
Constrains may be arbitrarily complex and include many conditions.
43.6.1. Employee Constraints
Each employee works on at least one project, or supervises at least one employee that works on at least one project, or manages at least one department.
Constraint
#Constraint in Terp syntax:
#how are you not loving Terp by now?!
#:Employee rdfs:subClassOf (:works_on some (:Project or
#(:supervises some (:Employee and (:works_on some :Project))) or (:manages some :Department)))
:Employee rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :works_on ;
owl:someValuesFrom
[ owl:unionOf (:Project
[ a owl:Restriction ;
owl:onProperty :supervises ;
owl:someValuesFrom
[ owl:intersectionOf (:Employee
[ a owl:Restriction ;
owl:onProperty :works_on ;
owl:someValuesFrom :Project
])
]
]
[ a owl:Restriction ;
owl:onProperty :manages ;
owl:someValuesFrom :Department
])
]
] .
Database C (valid)
:Esteban a :Employee ;
:supervises :Lucinda .
:Lucinda a :Employee ;
:works_on :MyProject .
:MyProject a :Project .
Database E (valid)
:Esteban a :Employee ;
:manages :MyDepartment ;
:works_on :MyProject .
:MyDepartment a :Department .
:MyProject a :Project .
This constraint says that if an individual i is an instance of
Employee, then at least one of three conditions must be met:
-
it is related to an instance of
Projectvia the propertyworks_on -
it is related to an instance
jvia the propertysupervisesandjis an instance ofEmployeeand is also related to some instance ofProjectvia the propertyworks_on -
it is related to an instance of
Departmentvia the propertymanages.
A and B are invalid because none of the conditions are met. C
meets the second condition: Esteban (who is an Employee) is related
to Lucinda via the property supervises whereas Lucinda is both an
Employee and related to MyProject, which is a Project, via the
property works_on. D meets the third condition: Esteban is related
to an instance of Department, namely MyDepartment, via the property
manages. Finally, E meets the first and the third conditions because
in addition to managing a department Esteban is also related an
instance of Project, namely MyProject, via the property works_on.
43.6.2. Employees and US government funding
Only employees who are American citizens can work on a project that receives funds from a US government agency.
Constraint
#Constraint in Terp syntax:
#:Project and (:receives_funds_from some :US_Government_Agency)) rdfs:subClassOf
#(inverse :works_on only (:Employee and (:nationality value "US")))
[ owl:intersectionOf (:Project
[ a owl:Restriction ;
owl:onProperty :receives_funds_from ;
owl:someValuesFrom :US_Government_Agency
]) .
] rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty [owl:inverseOf :works_on] ;
owl:allValuesFrom [ owl:intersectionOf (:Employee
[ a owl:Restriction ;
owl:hasValue "US" ;
owl:onProperty :nationality
])
]
] .
Database A (valid)
:MyProject a :Project ;
:receives_funds_from :NASA .
:NASA a :US_Government_Agency
Database B (invalid)
:MyProject a :Project ;
:receives_funds_from :NASA .
:NASA a :US_Government_Agency .
:Andy a :Employee ;
:works_on :MyProject .
Database C (valid)
:MyProject a :Project ;
:receives_funds_from :NASA .
:NASA a :US_Government_Agency .
:Andy a :Employee ;
:works_on :MyProject ;
:nationality "US" .
Database D (invalid)
:MyProject a :Project ;
:receives_funds_from :NASA .
:NASA a :US_Government_Agency .
:Andy a :Employee ;
:works_on :MyProject ;
:nationality "US" .
:Heidi a :Supervisor ;
:works_on :MyProject ;
:nationality "US" .
Database E (valid)
:MyProject a :Project ;
:receives_funds_from :NASA .
:NASA a :US_Government_Agency .
:Andy a :Employee ;
:works_on :MyProject ;
:nationality "US" .
:Heidi a :Supervisor ;
:works_on :MyProject ;
:nationality "US" .
:Supervisor rdfs:subClassOf :Employee .
SubClassOf: Employee
This constraint says that if an individual i is an instance of Project and
is related to an instance of US_Government_Agency via the property
receives_funds_from, then any individual j which is related to i via the
property works_on must satisfy two conditions:
-
it must be an instance of
Employee -
it must not be related to any literal other than
"US"via the data propertynationality.
A is valid because there is no individual related to MyProject via works_on,
so the constraint is trivially satisfied. B is invalid since Andy is related
to MyProject via works_on, MyProject is an instance of Project and is
related to an instance of US_Government_Agency, that is, NASA, via
receives_funds_from, but Andy does not have any data property assertions. C
is valid because both conditions are met. D is not valid because Heidi
violated the first condition: she is related to MyProject via works_on but
is not known to be an instance of Employee. Finally, this is fixed in E—by
way of a handy OWL axiom—which states that every instance of Supervisor is an
instance of Employee, so Heidi is inferred to be an instance of Employee
and, consequently, E is valid.[39]
If you made it this far, you deserve a drink!
44. Using ICV Programmatically
Here we describe how to use Stardog ICV via the SNARL APIs. For more information on using SNARL in general, please refer to the chapter on programming Stardog with Java.
There is command-line interface support for many of the operations necessary to using ICV with a Stardog database; please see the Admin chapter for that documentation.
To use ICV in Stardog, one must:
-
create some constraints
-
associate those constraints with a Stardog database
44.1. Creating Constraints
Constraints
can be created using the
ConstraintFactory
which provides methods for creating integrity constraints. ConstraintFactory
expects your constraints, if they are defined as OWL axioms, as RDF triples (or
graph). To aid in authoring constraints in OWL,
ExpressionFactory
is provided for building the RDF
equivalent of the OWL axioms of your constraint.
You can also write your constraints in OWL in your favorite editor and load them into the database from your OWL file.
We recommend defining your constraints as OWL axioms, but you are free
to define them using SPARQL SELECT queries. If you choose to define a
constraint using a SPARQL SELECT query, please keep in mind that if your
query returns results, those are interpreted as the violations of the
integrity constraint.
An example of creating a simple constraint using ExpressionFactory:
URI Product = ValueFactoryImpl.getInstance().createURI("urn:Product");
URI Manufacturer = ValueFactoryImpl.getInstance().createURI("urn:Manufacturer");
URI manufacturedBy = ValueFactoryImpl.getInstance().createURI("urn:manufacturedBy");
// we want to say that a product should be manufactured by a Manufacturer:
Constraint aConstraint = ConstraintFactory.constraint(subClassOf(Product,
some(manufacturedBy, Manufacturer)));
44.2. Adding Constraints to Stardog
The ICVConnection interface provides programmatic access to the ICV support in Stardog. It provides support for adding, removing and clearing integrity constraints in your database as well as methods for checking whether or not the data is valid; and when it’s not, retrieving the list of violations.
This example shows how to add an integrity constraint to a Stardog database.
// We'll start out by creating a validator from our SNARL Connection
ICVConnection aValidator = aConn.as(ICVConnection.class);
// add add a constraint, which must be done in a transaction.
aValidator.addConstraint(aConstraint);
Here we show how to add a set of constraints as defined in a local OWL ontology.
// We'll start out by creating a validator from our SNARL Connection
ICVConnection aValidator = aConn.as(ICVConnection.class);
// add add a constraint
aValidator.addConstraints()
.format(RDFFormat.RDFXML)
.file(new File("myConstraints.owl"));
44.3. IC Validation
Checking whether or not the contents of a database are valid is easy. Once you
have an
ICVConnection
you can simply call its
isValid()
method which will return whether or not the contents of the database are valid
with respect to the constraints associated with that database. Similarly, you
can provide some
constraints to
the isValid() method to see if the data in the database is invalid for those
specific constraints; which can be a subset of the constraints associated
with the database, or they can be new constraints you are working on.
If the data is invalid for some constraints—either the explicit
constraints in your database or a new set of constraints you have
authored—you can get some information about what the violation was from
the SNARL IC Connection.
ICVConnection.getViolationBindings()
will return the constraints which are violated, and for each constraint,
you can get the violations as the set of bindings that satisfied the
constraint query. You can turn the bindings into the individuals which
are in the violation using
ICV.asIndividuals().
44.4. ICV and Transactions
In addition to using the ICConnection a data oracle to tell whether or not your data is valid with respect to some constraints, you can also use Stardog’s ICV support to protect your database from invalid data by using ICV as a guard within transactions.
When guard mode for ICV is enabled in Stardog, each commit is inspected to ensure that the contents of the database are valid for the set of constraints that have been associated with it. Should someone attempt to commit data which violates one or more of the constraints defined for the database, the commit will fail and the data will not be added/removed from your database.
By default, reasoning is not used when you enable guard mode, however you are free to specify any of the reasoning types supported by Stardog when enabling guard mode. If you have provided a specific reasoning type for guard mode it will be used during validation of the integrity constraints. This means you can author your constraints with the expectation of inference results satisfying a constraint.
AdminConnection dbms = AdminConnectionConfiguration.toEmbeddedServer().credentials("admin", "admin").connect();
dbms.disk("icvWithGuard") // disk db named 'icvWithGuard'
.set(DatabaseOptions.ICV_ENABLED, true) // enable icv guard mode
.set(DatabaseOptions.ICV_REASONING_TYPE, ReasoningType.QL) // specify the reasoning level icv guard should use
.create(new File("data/sp2b_10k.n3")); // create the db, bulk loading the file(s) to start
dbms.close();
This illustrates how to create a persistent disk database with ICV guard mode enabled at the QL reasoning type. Guard mode can also be enabled when the database is created on the CLI.
45. Terminology
This chapter may make more sense if you read this section on terminology a few times.
45.1. ICV, Integrity Constraint Validation
The process of checking whether some Stardog database is valid with
respect to some integrity constraints. The result of ICV is a boolean
value (true if valid, false if invalid) and, optionally, an explanation
of constraint violations.
45.2. Schema, TBox
A schema (or "terminology box" a.k.a., TBox) is a set of statements that define the relationships between data elements, including property and class names, their relationships, etc. In practical terms, schema statements for a Stardog database are RDF Schema and OWL 2 terms, axioms, and definitions.
45.3. Data, ABox
All of the triples in a Stardog database that aren’t part of the schema are part of the data (or "assertional box" a.k.a. ABox).
45.4. Integrity Constraint
A declarative expression of some rule or constraint which data must conform to in order to be valid. Integrity Constraints are typically domain and application specific. They can be expressed in OWL 2 (any legal syntax), SWRL rules, or (a restricted form of) SPARQL queries.
45.5. Constraints
Constraints that have been associated with a Stardog database and which are used to validate the data it contains. Each Stardog may optionally have one and only one set of constraints associated with it.
45.6. Closed World Assumption, Closed World Reasoning
Stardog ICV assumes a closed world with respect to data and constraints:
that is, it assumes that all relevant data is known to it and included
in a database to be validated. It interprets the meaning of Integrity
Constraints in light of this assumption; if a constraint says a value
must be present, the absence of that value is interpreted as a
constraint violation and, hence, as invalid data.
45.7. Open World Assumption, Open World Reasoning
A legal OWL 2 inference may violate or satisfy an Integrity Constraint
in Stardog. In other words, you get to have your cake (OWL as a
constraint language) and eat it, too (OWL as modeling or inference
language). This means that constraints are applied to a Stardog database
with respect to an OWL 2 profile.
45.8. Monotonicity
OWL is a monotonic language: that means you can never add anything to a
Stardog database that causes there to be fewer legal inferences. Or, put
another way, the only way to decrease the number of legal inferences is
to delete something.
Monotonicity interacts with ICV in the following ways:
-
Adding data to or removing it from a Stardog database may make it invalid.
-
Adding schema statements to or removing them from a Stardog database may make it invalid.
-
Adding new constraints to a Stardog database may make it invalid.
-
Deleting constraints from a Stardog database
cannot make it invalid.
Programming Stardog
46. Sample Code
There’s a Github repo of example Java code that you can fork and use as the starting point for your Stardog projects. Feel free to add new examples using pull requests in Github.
Java Programming
In the Network Programming section, we look at how to interact with Stardog over a network via HTTP and SNARL protocol. In this chapter we describe how to program Stardog from Java using SNARL Stardog Native API for the RDF Language, Sesame, and Jena. We prefer SNARL to Sesame to Jena and recommend—all other things being equal—them in that order.
If you’re a Spring developer, you might want to read Spring Programming or if you prefer a ORM-style approach, you might want to checkout Empire, an implementation of JPA for RDF that works with Stardog.
47. Examples
The best way to learn to program Stardog with Java is to study the examples:
We offer some commentary on the interesting parts of these examples below.
47.1. Creating & Administering Databases
AdminConnection provides simple programmatic access to all administrative
functions available in Stardog.
47.1.1. Creating a Database
You can create a basic temporary memory database with Stardog with one line of code:
AdminConnection aAdminConnection = AdminConnectionConfiguration.toEmbeddedServer()
.credentials("admin", "admin")
.connect();
aAdminConnection.createMemory("testConnectionAPI");
// you must always log out of the dbms.
aAdminConnection.close();
|
Note
|
It’s important to always clean up connections to the database by calling `AdminConnection#close(). |
You can also use the
memory
and
disk
functions to configure and create a database in any way you prefer.
These methods return
DatabaseBuilder
objects which you can use to configure the options of the database
you’d like to create. Finally, the
create
method takes the list of files to bulk load into the database when you
create it and returns a valid
ConnectionConfiguration
which can be used to create new
Connections
to your database.
|
Warning
|
It is important to note that you must take
care to always log out of the server when you are done working with
AdminConnection.
|
AdminConnection dbms = AdminConnectionConfiguration.toEmbeddedServer().credentials("admin", "admin").connect();
dbms.memory("waldoTest")
.searchable(true)
.create();
dbms.close();
This illustrates how to create a temporary memory database named test
which supports full text search via Searching.
AdminConnection dbms = AdminConnectionConfiguration.toEmbeddedServer().credentials("admin", "admin").connect();
dbms.disk("icvWithGuard") // disk db named 'icvWithGuard'
.set(DatabaseOptions.ICV_ENABLED, true) // enable icv guard mode
.set(DatabaseOptions.ICV_REASONING_TYPE, ReasoningType.QL) // specify the reasoning level icv guard should use
.create(new File("data/sp2b_10k.n3")); // create the db, bulk loading the file(s) to start
dbms.close();
This illustrates how to create a persistent disk database with ICV guard mode
enabled at the QL reasoning type. For more information on what the available
options for set are and what they mean, see the Database Admin section.
Also note, Stardog database administration can be performed from the CLI.
47.2. Creating a Connection String
As you can see, the
ConnectionConfiguration
in
com.clarkparsia.stardog.api
package class is where the initial action takes place:
Connection aConn = ConnectionConfiguration
.to("noReasoningExampleTest") // the name of the db to connect to
.credentials("admin", "admin") // credentials to use while connecting
.connect();
The
to
method takes a Database Name as a string; and then
connect
connects to the database using all specified properties on the
configuration. This class and its constructor methods are used for all of
Stardog’s Java APIs: SNARL native Stardog API, Sesame, Jena, as well as HTTP and
SNARL protocol. In the latter cases, you must also call
server
and pass it a valid URL to the Stardog server using the HTTP or SNARL protocols.
Without the call to server, ConnectionConfiguration will attempt
to connect to a local, embedded version of the Stardog server. The
Connection still operates in the standard client-server mode, the only
difference is that the server is running in the same JVM as your
application.
|
Note
|
Whether using SNARL, Sesame, or Jena, most perhaps all
Stardog Java code will use ConnectionConfiguration to get a handle on
a Stardog database—whether embedded or remote—and, after getting that
handle, can use the appropriate API.
|
See the
ConnectionConfiguration
API docs or How to Make a Connection String for more information.
47.3. Managing Security
We discuss the security sytem in Stardog in Security. When logged into the Stardog DBMS you can access all security related features detailed in the security section using any of the core security interfaces for managing users, roles, and permissions.
47.4. Using SNARL
In examples 1 and 4 above, you can see how to use SNARL in Java to interact with Stardog. The SNARL API will give the best performance overall and is the native Stardog API. It uses some Sesame domain classes but is otherwise a clean-sheet API and implementation.
The SNARL API is fluent with the aim of making code written for Stardog easier to write and easier to maintain. Most objects are easily re-used to make basic tasks with SNARL as simple as possible. We are always interested in feedback on the API, so if you have suggestions or comments, please send them to the mailing list.
Let’s take a closer look at some of the interesting parts of SNARL.
47.5. Adding Data
aConn.begin();
aConn.add().io()
.format(RDFFormat.N3)
.stream(new FileInputStream("data/sp2b_10k.n3"));
Graph aGraph = Graphs.newGraph(ValueFactoryImpl.getInstance()
.createStatement(ValueFactoryImpl.getInstance().createURI("urn:subj"),
ValueFactoryImpl.getInstance().createURI("urn:pred"),
ValueFactoryImpl.getInstance().createURI("urn:obj")));
Resource aContext = ValueFactoryImpl.getInstance().createURI("urn:test:context");
aConn.add().graph(aGraph, aContext);
aConn.commit();
You must always enclose changes to a database within a transaction begin and commit or rollback. Changes are local until the transaction is committed or until you try and perform a query operation to inspect the state of the database within the transaction.
By default, RDF added will go into the default context unless specified
otherwise. As shown, you can use
Adder directly
to add statements and graphs to the database; and if you want to add
data from a file or input stream, you use the
io, format,
and stream chain of method invocations.
See the SNARL API Javadocs for all the gory details.
47.6. Removing Data
// first start a transaction
aConn.begin();
aConn.remove().io()
.format(RDFFormat.N3)
.file(new File("data/remove_data.nt"));
// and commit the change
aConn.commit();
Let’s look at
removing data
via SNARL; in the example above, you can see that file or stream-based
removal is symmetric to file or stream-based addition, i.e., calling
remove in an io chain with a file or stream call. See the SNARL
API docs for more details about finer-grained deletes, etc.
47.7. Parameterized SPARQL Queries
URI aURI = ValueFactoryImpl.getInstance()
.createURI("http://localhost/publications/articles/Journal1/1940/Article1");
SelectQuery aQuery = aConn.select("select * where {?s ?p ?o}");
// now we can run this query...but lets set a limit on it since otherwise that'd be the whole database
aQuery.limit(10);
TupleQueryResult aResult = aQuery.execute();
System.out.println("The first ten results...");
// and do something with the results
while (aResult.hasNext()) {
System.out.println(aResult.next());
}
// always close your result sets
aResult.close();
// query objects are easily parameterized, we can bind the "s" variable in the previous query
// with a specific value
aQuery.parameter("s", aURI);
// and remove the limit
aQuery.limit(SelectQuery.NO_LIMIT);
// now we can re-run the query
aResult = aQuery.execute();
System.out.println("\nNow a particular slice...");
while (aResult.hasNext()) {
System.out.println(aResult.next());
}
aResult.close();
SNARL also lets us parameterize SPARQL queries. We can make a Query object by
passing a SPARQL query in the constructor. Simple. Obvious.
Next, let’s set a limit for the results: aQuery.limit10; or if we
want no limit, aQuery.limitQuery.NO_LIMIT. By default, there is no
limit imposed on the query object; we’ll use whatever is specified in
the query. But you can use limit to override any limit specified in the
query, however specifying NO_LIMIT will not remove a limit specified in
a query, it will only remove any limit override you’ve specified,
restoring the state to the default of using whatever is in the query.
We can execute that query with executeSelect and iterate over the
results. We can also rebind the "?s" variable easily:
aQuery.parameter"s", aURI, which will work for all instances of "?s"
in any BGP in the query, and you can specify null to remove the
binding.
Query objects are re-useable, so you can create one from your original query string and alter bindings, limit, and offset in any way you see fit and re-execute the query to get the updated results.
We strongly recommend the use of SNARL’s parameterized queries over concatenating strings together in order to build your SPARQL query. This latter approach opens up the possibility for SPARQL injection attacks unless you are very careful in scrubbing your input.[40]
47.8. Getter Interface
// The previous query was just getting the statements which the value of aURI is the subject,
// which we can easily do via the getter interface
Iteration<Statement, StardogException> aIter = aConn.get().subject(aURI).iterator();
System.out.println("\nOr you can use a getter to do the same thing...");
while (aIter.hasNext()) {
System.out.println(aIter.next());
}
// always close your iterations as well...
aIter.close();
// Getter objects are parameterizable like queries, and can be reused
Getter aGetter = aConn.get();
aGetter.predicate(RDF.TYPE);
// calling iterator() on this getter will return all statements which have RDF.TYPE as the predicate
// or we can bind the subject and get a specific type statement...
aGetter.subject(aURI);
// this will return the type triple for aURI as an Iteration
aIter = aGetter.iterator();
System.out.println("\nJust a single statement now...");
while (aIter.hasNext()) {
System.out.println(aIter.next());
}
aIter.close();
// revert having set the predicate on the getter
aGetter.predicate(null);
// we can also get the results as a graph:
aGraph = aGetter.graph();
System.out.println("\nFinally, the same results as earlier, but as a graph...");
GraphIO.writeGraph(aGraph, new OutputStreamWriter(System.out), RDFFormat.TURTLE);
SNARL also supports some sugar for the classic statement-level
getSPO--scars, anyone?--interactions. We ask in the first line of
the snippet above for an iterator over the Stardog connection, based on
aURI in the subject position. Then a while-loop, as one might
expect…You can also parameterize Getter`s by binding different positions of the
`Getter which acts like a kind of RDF statement filter—and then iterating as
usual.
|
Note
|
the aIter.close which is important for Stardog databases to
avoid memory leaks. If you need to materialize the iterator as a graph, you can
do that by calling graph.
|
The snippet doesn’t show object or context parameters on a
Getter, but those work, too, in the obvious way.
47.9. Reasoning
Stardog supports query-time reasoning using a
query rewriting technique. In short, when reasoning is requested, a query
is automatically rewritten to n queries, which are then executed. As
we discuss below in Connection Pooling, reasoning is enabled at the
Connection layer and then any queries executed over that connection
are executed with reasoning enabled; you don’t need to do anything up
front when you create your database if you want to use reasoning.
ReasoningConnection aReasoningConn = ConnectionConfiguration
.to("reasoningExampleTest")
.credentials("admin", "admin")
.reasoning(ReasoningType.QL)
.connect()
.as(ReasoningConnection.class);
In this code example, you can see that it’s trivial to enable reasoning
for a Connection: simply call reasoning with the appropriate
constant such as ReasoningType.QL passed in. In addition to OWL2 QL,
EL, and RL, Stardog supports OWL2 DL schema queries. Stardog also
supports SWRL and Stardog Rules, too.
47.10. Search
Stardog’s search system can be used from Java. The fluent Java API
for searching in SNARL looks a lot like the other search interfaces: We create a
Searcher instance with a fluent constructor: limit sets a limit on the
results; query contains the search query, and threshold sets a minimum
threshold for the results.
Searcher aSearch = aSearchConn.search()
.limit(50) // as before we only want the top fifty results
.query("mac") // our search term
.threshold(0.5); // Since Waldo is implemented over lucene, we can also specify a min threshold for our results
SearchResults aSearchResults = aSearch.search();
// and now we can just iterate over the search results
Iteration<SearchResult, QueryEvaluationException> resultIt = aSearchResults.iteration();
System.out.println("\nAPI results: ");
while (resultIt.hasNext()) {
SearchResult aHit = resultIt.next();
System.out.println(aHit.getHit() + " with a score of: " + aHit.getScore());
}
// don't forget to close your iteration!
resultIt.close();
// we can also re-use the searcher if we want to find the next set of results...
aSearch.offset(50); // we already found the first fifty, so lets grab the next set
aSearchResults = aSearch.search();
// we can now check the next page of search results!
Then we call the search method of our Searcher instance and
iterate over the results i.e., SearchResults. Last, we can use
offset on an existing Searcher to grab another page of results.
Stardog also supports performing searches over the full-text index within a
SPARQL query via the LARQ SPARQL
syntax. This provides a powerful mechanism for querying both your RDF index and
full-text index at the same time while also giving you a more performant option
to the SPARQL regex filter.
47.11. SNARL Connection Views
SNARL
Connections
support obtaining a specified type of Connection. This lets you extend and
enhance the features available to a Connection while maintaining the standard,
simple Connection API. The Connection
as method
takes as a parameter the interface, which must be a sub-type of a Connection,
that you would like to use. as will either return the Connection as the view
you’ve specified, or it will throw an exception if the view could not be
obtained for some reason.
An example of obtaining an instance of a
SearchConnection
to use Stardog’s full-text search support would look like this:
SearchConnection aSearchConn = aConn.as(SearchConnection.class);
47.12. SNARL API Docs
Please see SNARL API docs for more information.
48. Using Sesame
Stardog supports the Sesame API; thus, for the most part, using Stardog and Sesame is not much different from using Sesame with other RDF databases. There are, however, at least two differences worth pointing out.
48.1. Wrapping connections with StardogRepository
// Create a Sesame Repository from a Stardog ConnectionConfiguration. The configuration will be used
// when creating new RepositoryConnections
Repository aRepo = new StardogRepository(ConnectionConfiguration
.to("testSesame")
.credentials("admin", "admin"));
// init the repo
aRepo.initialize();
// now you can use it like a normal Sesame Repository
RepositoryConnection aRepoConn = aRepo.getConnection();
// always best to turn off auto commit
aRepoConn.setAutoCommit(false);
As you can see from the code snippet, once you’ve created a
ConnectionConfiguration with all the details for connecting to a
Stardog database, you can wrap that in a StardogRepository which is a
Stardog-specific implementation of the Sesame Repository interface. At
this point, you can use the resulting Repository like any other Sesame
Repository implementation. Each time you call
Repository.getConnection, your original ConnectionConfiguration will
be used to spawn a new connection to the database.
48.2. Autocommit
Stardog’s RepositoryConnection implementation will, by default, disable
autoCommit status. When enabled, every single statement added or
deleted via the Connection will incur the cost of a transaction, which
is too heavyweight for most use cases. You can enable
autoCommit and it will work as expected; but we recommend
leaving it disabled.
49. Using Jena
Stardog supports Jena via a Sesame-Jena bridge, so it’s got more overhead than Sesame or SNARL. YMMV. There two points in the Jena example to emphasize.
49.1. Init in Jena
// obtain a Jena model for the specified stardog database connection. Just creating an in-memory
// database; this is roughly equivalent to ModelFactory.createDefaultModel.
Model aModel = SDJenaFactory.createModel(aConn);
The initialization in Jena is a bit different from either SNARL or
Sesame; you can get a Jena Model instance by passing the Connection
instance returned by ConnectionConfiguration to the Stardog factory,
SDJenaFactory.
49.2. Add in Jena
// start a transaction before adding the data. This is not required,
// but it is faster to group the entire add into a single transaction rather
// than rely on the auto commit of the underlying stardog connection.
aModel.begin();
// read data into the model. note, this will add statement at a time.
// Bulk loading needs to be performed directly with the BulkUpdateHandler provided
// by the underlying graph, or by reading in files in RDF/XML format, which uses the
// bulk loader natively. Alternatively, you can load data into the Stardog
// database using SNARL, or via the command line client.
aModel.getReader("N3").read(aModel, new FileInputStream("data/sp2b_10k.n3"), "");
// done!
aModel.commit();
Jena also wants to add data to a Model one statement at a time, which
can be less than ideal. To work around this restriction, we recommend
adding data to a Model in a single Stardog transaction, which is
initiated with aModel.begin. Then to read data into the model, we
recommend using RDF/XML, since that triggers the BulkUpdateHandler in
Jena or grab a BulkUpdateHandler directly from the underlying Jena
graph.
The other options include using the Stardog CLI client to bulk load a Stardog database or to use SNARL for loading and then switch to Jena for other operations, processing, query, etc.
50. Client-Server Stardog
Using Stardog from Java in either embedded or
client-server mode is very similar--the only visible difference
is the use of url in a ConnectionConfiguration: when it’s present,
we’re in client-server model; else, we’re in embedded mode.
That’s a good and a bad thing: it’s good because the code is symmetric and uniform. It’s bad because it can make reasoning about performance difficult, i.e., it’s not entirely clear in client-server mode which operations trigger or don’t trigger a round trip with the server and, thus, which may be more expensive than they are in embedded mode.
In client-server mode, everything triggers a round trip with these exceptions:
-
closing a connection outside a transaction
-
any parameterizations or other of a query or getter instance
-
any database state mutations in a transaction that don’t need to be immediately visible to the transaction; that is, changes are sent to the server only when they are required, on commit, or on any query or read operation that needs to have the accurate up-to-date state of the data within the transaction.
Stardog generally tries to be as lazy as possible; but in client-server mode, since state is maintained on the client, there are fewer chances to be lazy and more interactions with the server.
51. Connection Pooling
Stardog supports connection pools for SNARL Connection objects for
efficiency and programmer sanity. Here’s how they work:
Server aServer = Stardog
.buildServer()
.bind(SNARLProtocolConstants.EMBEDDED_ADDRESS)
.start();
// First create a temporary database to use (if there is one already, drop it first)
AdminConnection aAdminConnection = AdminConnectionConfiguration.toEmbeddedServer().credentials("admin", "admin").connect();
if (aAdminConnection.list().contains("testConnectionPool")) {
aAdminConnection.drop("testConnectionPool");
}
aAdminConnection.createMemory("testConnectionPool");
aAdminConnection.close();
// Now, we need a configuration object for our connections, this is all the information about
// the database that we want to connect to.
ConnectionConfiguration aConnConfig = ConnectionConfiguration
.to("testConnectionPool")
.credentials("admin", "admin");
// We want to create a pool over these objects. See the javadoc for ConnectionPoolConfig for
// more information on the options and information on the defaults.
ConnectionPoolConfig aConfig = ConnectionPoolConfig
.using(aConnConfig) // use my connection configuration to spawn new connections
.minPool(10) // the number of objects to start my pool with
.maxPool(1000) // the maximum number of objects that can be in the pool (leased or idle)
.expiration(1, TimeUnit.HOURS) // Connections can expire after being idle for 1 hr.
.blockAtCapacity(1, TimeUnit.MINUTES); // I want obtain to block for at most 1 min while trying to obtain a connection.
// now i can create my actual connection pool
ConnectionPool aPool = aConfig.create();
// if I want a connection object...
Connection aConn = aPool.obtain();
// now I can feel free to use the connection object as usual...
// and when I'm done with it, instead of closing the connection, I want to return it to the pool instead.
aPool.release(aConn);
// and when I'm done with the pool, shut it down!
aPool.shutdown();
// you MUST stop the server if you've started it!
aServer.stop();
Per standard practice, we first initialize security and grab a connection, in
this case to the testConnectionPool database. Then we setup a
ConnectionPoolConfig, using its fluent API, which establishes the parameters
of the pool:
using
|
Sets which ConnectionConfiguration we want to pool; this is what is used to actually create the connections. |
minPool, maxPool
|
Establishes min and max pooled objects; max pooled objects includes both leased and idled objects. |
expiration
|
Sets the idle life of objects; in this case, the pool reclaims objects idled for 1 hour. |
blockAtCapacity
|
Sets the max time in minutes that we’ll block waiting for an object when there aren’t any idle ones in the pool. |
Whew! Next we can create the pool using this ConnectionPoolConfig
thing.
Finally, we call obtain on the ConnectionPool when we need a new
one. And when we’re done with it, we return it to the pool so it can be
re-used, by calling release. When we’re done, we shutdown the
pool.
Since reasoning in Stardog is enabled per Connection, you
can create two pools: one with reasoning connections, one with
non-reasoning connections; and then use the one you need to have
reasoning per query; never pay for more than you need.
52. API Deprecation
Methods and classes in SNARL API that are marked with the
com.google.common.annotations.Beta are subject to change or removal in
any release. We are using this annotation to denote new or experimental
features, the behavior or signature of which may change significantly
before it’s out of "beta".
We will otherwise attempt to keep the public APIs as stable as possible,
and methods will be marked with the standard @Deprecated annotation
for a least one full revision cycle before their removal from the SNARL
API. See Compatibility Policies for more information about API stability.
Anything marked @VisibleForTesting is just that, visible as a
consequence of test case requirements; don’t write any important code
that depends on functions with this annotation.
53. Support for Maven
Like Maven generated archives, Stardog client JARs contain Maven
meta information pom.xml and pom.properties files. Dependency information
is included in the pom.xml files and the pom.properties files include some
basic properties. Located in the Stardog distribution bin `directory, the
script `mavenInstall (and mavenInstall.bat for Windows systems) will install
the Stardog client jars into the local Maven repository.
|
Note
|
Only client dependencies are provided, which does not include running the server in the embedded mode. For those use cases, the server JARs must still be included. |
The following table summarizes the type of client to be built and its
associated Stardog dependency. The stardog dependency list below follows
the Gradle convention and is of the form:
groupId:artifactId:version. Versions 2.1 and higher supported.
Type of Client |
Stardog Dependency |
SNARL client |
|
HTTP client |
|
reasoning snarl client |
|
reasoning http client |
|
search snarl client |
|
search http client |
|
ICV SNARL client |
|
ICV HTTP client |
|
Empire client |
|
Jena SNARL client |
|
Jena HTTP client |
|
Sesame SNARL client |
|
Sesame HTTP client |
|
Network Programming
In the Java Programming section, we consider interacting with Stardog programatically from a Java program. In this section we consider interacting with Stardog over HTTP. In some use cases or deployment scenarios, it may be necessary to interact with or control Stardog remotely over an IP-based network.
Stardog supports SPARQL 1.0 HTTP Protocol; the SPARQL 1.1 Graph Store HTTP Protocol; the Stardog HTTP Protocol; and SNARL, an RPC-style protocol based on Google Protocol Buffers.
54. SPARQL Protocol
Stardog supports the standard SPARQL Protocol HTTP bindings, as well as additional functionality via HTTP. Stardog also supports SPARQL 1.1’s Service Description format. See the spec if you want details.
54.1. Stardog HTTP Protocol
The Stardog HTTP Protocol supports SPARQL Protocol 1.1 and additional resource representations and capabilities. The Stardog HTTP API v4 is also available on Apiary: http://docs.stardog.apiary.io/. The Stardog Linked Data API (aka "Annex") is also documented on Apiary: http://docs.annex.apiary.io/.
54.1.1. Generating URLs
If you are running the HTTP server at
http://localhost:12345/
To form the URI of a particular Stardog Database, the Database Short
Name is the first URL path segment appended to the deployment URI. For
example, for the Database called cytwombly, deployed in the above
example HTTP server, the Database Network Name might be
http://localhost:12345/cytwombly
All the resources related to this database are identified by URL path segments relative to the Database Network Name; hence:
http://localhost:12345/cytwombly/size
In what follows, we use URI Template
notation to parameterize the actual request URLs, thus: /{db}/size.
We also abuse notation to show the permissible HTTP request types and default
MIME types in the following way: REQ | REQ /resource/identifier → mime_type |
mime_type. In a few cases, we use void as short hand for the case where there
is a response code but the response body may be empty.
54.2. HTTP Headers: Content-Type & Accept
All HTTP requests that are mutative (add or remove) must include a valid
Content-Type header set to the MIME type of the request body, where
"valid" is a valid MIME type for N-Triples, Trig, Trix, Turtle, NQuads,
JSON-LD, or RDF/XML:
| RDF/XML |
|
| Turtle |
|
| N-Triples |
|
| TriG |
|
| TriX |
|
| NQuads |
|
| JSON-LD |
|
SPARQL CONSTRUCT queries must also include a Accept header set to one of these RDF serialization types.
When issuing a SELECT query the Accept header should be set to one
of the valid MIME types for SELECT results:
| SPARQL XML Results Format |
|
| SPARQL JSON Results Format |
|
| SPARQL Boolean Results |
|
| SPARQL Binary Results |
|
54.3. Response Codes
Stardog uses the following HTTP response codes:
200
|
Operation has succeeded. |
202
|
Operation was recieved successfully and will be processed shortly. |
400
|
Indicates parse errors or that the transaction identifier specified for an operation is invalid or does not correspond to a known transaction. |
401
|
Request is unauthorized. |
403
|
User attempting to perform the operation does not exist, their username or password is invalid, or they do not have the proper credentials to perform the action. |
404
|
A resource involved in the request—for example the database or transaction—does not exist. |
409
|
A conflict for some database operations; for example, creating a database that already exists. |
500
|
A unspecified failure in some internal operation…Call your office, Senator! |
There are also Stardog-specific error codes in the SD-Error-Code header in the
response from the server. These can be used to further clarify the reason for
the failure on the server, especially in cases where it could be ambiguous. For
example, if you received a 404 from the server trying to commit a transaction
denoted by the path /myDb/transaction/commit/293845klf9f934…it’s probably
not clear what is missing: it’s either the transaction or the database. In this
case, the value of the SD-Error-Code header will clarify.
The enumeration of SD-Error-Code values and their meanings are as follows:
0
|
Authentication error |
1
|
Authorization error |
2
|
Query evaluation error |
3
|
Query contained parse errors |
4
|
Query is unknown |
5
|
Transaction not found |
6
|
Database not found |
7
|
Database already exists |
8
|
Database name is invalid |
9
|
Resource (user, role, etc) already exists |
10
|
Invalid connection parameter(s) |
11
|
Invalid database state for the request |
12
|
Resource in use |
13
|
Resource not found |
14
|
Operation not supported by the server |
15
|
Password specified in the request was invalid |
In cases of error, the message body of the result will include any error information provided by the server to indicate the cause of the error.
55. Stardog Resources
To interact with Stardog over HTTP, use the following resource representations, HTTP response codes, and resource identifiers.
55.1. A Stardog Database
GET /{db} → void
Returns a representation of the database. As of Stardog 2.2.4, this is merely a placeholder; in a later release, this resource will serve the web console where the database can be interacted with in a browser.
55.3. Query Evaluation
GET | POST /{db}/query
The SPARQL endpoint for the database. The valid Accept types are listed above in the HTTP Headers section.
To issue SPARQL queries with reasoning over HTTP, see Using Reasoning.
55.4. SPARQL update
GET | POST /{db}/update
The SPARQL endpoint for updating the database with SPARQL Update. The valid Accept types are
application/sparql-update or application/x-www-form-urlencoded.
55.5. Query Plan
GET | POST /{db}/explain → text/plain
Returns the explanation for the execution of a query, i.e., a query
plan. All the same arguments as for Query Evaluation are legal here; but
the only MIME type for the Query Plan resource is text/plain.
55.6. Transaction Begin
POST /{db}/transaction/begin → text/plain
Returns a transaction identifier resource as text/plain, which is
likely to be deprecated in a future release in favor of a hypertext
format. POST to begin a transaction accepts neither body nor arguments.
55.6.1. Transaction Security Considerations
|
Warning
|
Stardog’s implementation of transactions with HTTP is vulnerable to man-in-the-middle attacks, which could be used to violate Stardog’s isolation guarantee (among other nasty side effects). |
Stardog’s transaction identifiers are 64-bit GUIDs and, thus, pretty hard to guess; but if you can grab a response in-flight, you can steal the transaction identifier if basic access auth or RFC 2069 digest auth is in use. You’ve been warned.
In a future release, Stardog will use RFC 2617 HTTP Digest Authentication, which is less vulnerable to various attacks and will never ask a client to use a different authentication type, which should lessen the likelihood of MitM attacks for properly restricted Stardog clients—that is, a Stardog client that treats any request by a proxy server or origin server (i.e., Stardog) to use basic access auth or RFC 2069 digest auth as a MitM attack. See RFC 2617 for more information.
55.7. Transaction Commit
POST /{db}/transaction/commit/{txId} → void | text/plain
Returns a representation of the committed transaction; 200 means the
commit was successful. Otherwise a 500 error indicates the commit
failed and the text returned in the result is the failure message.
As you might expect, failed commits exit cleanly, rolling back any changes that were made to the database.
55.8. Transaction Rollback
POST /{db}/transaction/rollback/{txId} → void | text/plain
Returns a representation of the transaction after it’s been rolled back.
200 means the rollback was successful, otherwise 500 indicates the
rollback failed and the text returned in the result is the failure
message.
55.9. Querying (Transactionally)
GET | POST /{db}/{txId}/query
Returns a representation of a query executed within the txId
transaction. Queries within transactions will be slower as extra
processing is required to make the changes visible to the query. Again,
the valid Accept types are listed above in the HTTP Headers section.
55.10. Adding Data (Transactionally)
POST /{db}/{txId}/add → void | text/plain
Returns a representation of data added to the database of the specified
transaction. Accepts an optional parameter, graph-uri, which specifies
the named graph the data should be added to. If a named graph is not
specified, the data is added to the default (i.e., unnamed) context. The
response codes are 200 for success and 500 for failure.
55.11. Deleting Data (Transactionally)
POST /{db}/{txId}/remove → void | text/plain
Returns a representation of data removed from the database within the
specified transaction. Also accepts graph-uri with the analogous
meaning as above--Adding Data (Transactionally). Response codes are also the same.
55.12. Clear Database
POST /{db}/{txId}/clear → void | text/plain
Removes all data from the database within the context of the
transaction. 200 indicates success; 500 indicates an error. Also
takes an optional parameter, graph-uri, which removes data from a
named graph. To clear only the default graph, pass DEFAULT as the value of graph-uri.
55.13. Explanation of Inferences
POST /{db}/reasoning/explain → RDF
POST /{db}/reasoning/{txId}/explain → RDF
Returns the explanation of the axiom which is in the body of the POST
request. The request takes the axioms in any supported RDF format and
returns the explanation for why that axiom was inferred as Turtle.
55.14. Explanation of Inconsistency
GET | POST /{db}/reasoning/explain/inconsistency → RDF
If the database is logically inconsistent, this returns an explanation for the inconsistency.
55.15. Consistency
GET | POST /{db}/reasoning/consistency → text/boolean
Returns whether or not the database is consistent w.r.t to the TBox.
55.16. Listing Integrity Constraints
GET /{db}/icv → RDF
Returns the integrity constraints for the specified database serialized in any supported RDF format.
55.17. Adding Integrity Constraints
POST /{db}/icv/add
Accepts a set of valid Integrity constraints serialized in any RDF format supported by Stardog and adds them to the database in an atomic action. 200 return code indicates the constraints were added successfully, 500 indicates that the constraints were not valid or unable to be added.
55.18. Removing Integrity Constraints
POST /{db}/icv/remove
Accepts a set of valid Integrity constraints serialized in any RDF
format supported by Stardog and removes them from the database in a
single atomic action. 200 indicates the constraints were successfully
remove; 500 indicates an error.
55.19. Clearing Integrity Constraints
POST /{db}/icv/clear
Drops all integrity constraints for a database. 200 indicates all
constraints were successfully dropped; 500 indicates an error.
55.20. Converting Constraints to SPARQL Queries
POST /{db}/icv/convert
The body of the POST is a single integrity constraint, serialized in
any supported RDF format, with Content-type set appropriately. Returns
either a text/plain result containing a single SPARQL query; or it
returns 400 if more than one constraint was included in the input.
56. Admin Resources
To administer Stardog over HTTP, use the following resource representations, HTTP response codes, and resource identifiers.
56.1. List databases
GET /admin/databases → application/json
Lists all the databases available.
Output JSON example:
{ "databases" : ["testdb", "exampledb"] }
56.2. Copy a database
PUT /admin/databases/{db}/copy?to={db_copy}
Copies a database db to another specified db_copy.
56.3. Create a new database
POST /admin/databases
Creates a new database; expects a multipart request with a JSON
specifying database name, options and filenames followed by (optional)
file contents as a multipart POST request.
Expected input (application/json):
{
"dbname" : "testDb",
"options" : {
"icv.active.graphs" : "http://graph, http://another",
"search.enabled" : true,
...
},
"files" : [{ "filename":"fileX.ttl", "context":"some:context" }, ...]
}
56.4. Drop an existing database
DELETE /admin/databases/{db}
Drops an existing database db and all the information that it
contains. Goodbye Callahan!
56.5. Migrate an existing database
PUT /admin/databases/{db}/migrate
Migrates the existing content of a legacy database to new format.
56.6. Optimize an existing database
PUT /admin/databases/{db}/optimize
Optimize an existing database.
56.7. Sets an existing database online.
PUT /admin/databases/{db}/online
Request message to set an existing database database online.
56.8. Sets an existing database offline.
PUT /admin/databases/{db}/offline
Request message to set an existing database offline; receives optionally a JSON input to specify a timeout for the offline operation. When not specified, defaults to 3 minutes as the timeout; the timeout should be provided in milliseconds. The timeout is the amount of time the database will wait for existing connections to complete before going offline. This will allow open transaction to commit/rollback, open queries to complete, etc. After the timeout has expired, all remaining open connections are closed and the database goes offline.
Optional input (application/json):
{ "timeout" : timeout_in_ms}
56.9. Set option values to an existing database.
POST /admin/databases/{kb}/options
Set options in the database passed through a JSON object specification, i.e. JSON Request for option values. Database options can be found here.
Expected input (application/json):
{
"database.name" : "DB_NAME",
"icv.enabled" : true | false,
"search.enabled" : true | false,
...
}
56.10. Get option values of an existing database.
PUT /admin/databases/{kb}/options → application/json
Retrieves a set of options passed via a JSON object. The JSON input has empty values for each key, but will be filled with the option values in the database in the output.
Expected input:
{
"database.name" : ...,
"icv.enabled" : ...,
"search.enabled" : ...,
...
}
Output JSON example:
{
"database.name" : "testdb",
"icv.enabled" : true,
"search.enabled" : true,
...
}
56.11. Add a new user to the system.
POST /admin/users
Adds a new user to the system; allows a configuration option for superuser as a JSON object. Superuser configuration is set as default to false. The password must be provided for the user.
Expected input:
{
"username" : "bob",
"superuser" : true | false
"password" : "passwd"
}
56.12. Change user password.
PUT /admin/users/{user}/pwd
Changes user’s password in the system. Receives input of new password as a JSON Object.
Expected input:
{"password" : "xxxxx"}
56.13. Check if user is enabled.
GET /admin/users/{user}/enabled → application/json
Verifies if user is enabled in the system.
Output JSON example:
{
"enabled": true
}
56.14. Check if user is superuser.
GET /admin/users/{user}/superuser → application/json
Verifies if the user is a superuser:
{
"superuser": true
}
56.15. Listing users.
GET /admin/users → application/json
Retrieves a list of users.
Output JSON example:
{
"users": ["anonymous", "admin"]
}
56.16. Listing user roles.
GET /admin/users/{user}/roles → application/json
Retrieves the list of the roles assigned to user.
Output JSON example:
{
"roles": ["reader"]
}
56.18. Enabling users.
PUT /admin/users/{user}/enabled
Enables a user in the system; expects a JSON object in the following format:
{
"enabled" : true
}
56.19. Setting user roles.
PUT /admin/users/{user}/roles
Sets roles for a given user; expects a JSON object specifying the roles for the user in the following format:
{
"roles" : ["reader","secTestDb-full"]
}
56.20. Adding new roles.
POST /admin/roles
Adds the new role to the system.
Expected input:
{
"rolename" : ""
}
56.21. Listing roles.
GET /admin/roles → application/json
Retrieves the list of roles registered in the system.
Output JSON example:
{
"roles": ["reader"]
}
56.22. Listing users with a specified role.
GET /admin/roles/{role}/users → application/json
Retrieves users that have the role assigned.
Output JSON example:
{
"users": ["anonymous"]
}
56.23. Deleting roles.
DELETE /admin/roles/{role}?force={force}
Deletes an existing role from the system; the force parameter is a boolean flag which indicates if the delete call for the role must be forced.
56.24. Assigning permissions to roles.
PUT /admin/permissions/role/{role}
Creates a new permission for a given role over a specified resource; expects input JSON Object in the following format:
{
"action" : "read" | "write" | "create" | "delete" | "revoke" | "execute" | "grant" | "*",
"resource_type" : "user" | "role" | "db" | "named-graph" | "metadata" | "admin" | "icv-constraints" | "*",
"resource" : ""
}
56.25. Assigning permissions to users.
PUT /admin/permissions/user/{user}
Creates a new permission for a given user over a specified resource; expects input JSON Object in the following format:
{
"action" : "read" | "write" | "create" | "delete" | "revoke" | "execute" | "grant" | "*",
"resource_type" : "user" | "role" | "db" | "named-graph" | "metadata" | "admin" | "icv-constraints" | "*",
"resource" : ""
}
56.26. Deleting permissions from roles.
POST /admin/permissions/role/{role}/delete
Deletes a permission for a given role over a specified resource; expects input JSON Object in the following format:
{
"action" : "read" | "write" | "create" | "delete" | "revoke" | "execute" | "grant" | "*",
"resource_type" : "user" | "role" | "db" | "named-graph" | "metadata" | "admin" | "icv-constraints" | "*",
"resource" : ""
}
56.27. Deleting permissions from users.
POST /admin/permissions/user/{user}/delete
Deletes a permission for a given user over a specified resource; expects input JSON Object in the following format:
{
"action" : "read" | "write" | "create" | "delete" | "revoke" | "execute" | "grant" | "*",
"resource_type" : "user" | "role" | "db" | "named-graph" | "metadata" | "admin" | "icv-constraints" | "*",
"resource" : ""
}
56.28. Listing role permissions.
GET /admin/permissions/role/{role} → application/json
Retrieves permissions assigned to the role.
Output JSON example:
{
"permissions": ["stardog:read:*"]
}
56.29. Listing user permissions.
GET /admin/permissions/user/{user} → application/json
Retrieves permissions assigned to the user.
Output JSON example:
{
"permissions": ["stardog:read:*"]
}
56.30. Listing user effective permissions.
GET /admin/permissions/effective/user/{user} → application/json
Retrieves effective permissions assigned to the user.
Output JSON example:
{
"permissions": ["stardog:*"]
}
56.31. Shutdown server.
POST /admin/shutdown
Shuts down the Stardog Server. If successful, returns a 202 to
indicate that the request was recieved and that the server will be shut
down shortly.
56.32. Query Version Metadata
GET | POST /{db}/vcs/query
Issue a query over the version history metadata using SPARQL. Method has the same arguments and outputs as the normal query method of a database.
56.33. Versioned Commit
POST /{db}/vcs/{tid}/commit_msg
Input example:
This is the commit message
Accepts a commit message in the body of the request and performs a VCS commit of the specified transaction
56.34. Create Tag
POST /{db}/vcs/tags/create
Input example:
"f09c0e02350627480839da4661b8e9cbd70f6372", "This is the commit message"
Create a tag from the given revision id with the specified commit message.
Javascript Programming
57. stardog.js
This framework wraps all the functionality of a client for the Stardog DBMS and provides access to a full set of functions such as executing SPARQL Queries, administration tasks on Stardog, and the use of the Reasoning API.
The implementation uses the HTTP protocol, since most of Stardog functionality is available using this protocol. For more information, see Network Programming.
The framework is currently supported for node.js and the browser, including test cases for both environments. You’ll also need npm and bower to run the test cases and install the dependencies in node.js & the browser respectively.
Clojure Programming
58. Installation
Stardog-clj is available from Clojars. To use, just include the following dependency:
[stardog-clj "2.2.2"]
Starting with Stardog 2.2.2, the stardog-clj version always matches the latest release of Stardog.
59. Overview
Stardog-clj provides a set of functions as API wrappers to the native SNARL API. These functions provide the basis for working with Stardog, starting with connection management, connection pooling, and the core parts of the API, such as executing a SPARQL query or adding and removing RDF from the Stardog database. Over time, other parts of the Stardog API will be appropriately wrapped with Clojure functions and idiomatic Clojure data structures.
Stardog-clj provides the following features:
-
Specification based descriptions for connections, and corresponding "connection" and "with-connection-pool" functions and macros
-
Functions for query, ask, graph, and update to execute
SELECT,ASK,CONSTRUCT, and SPARQL Update queries respectively -
Functions for insert and remove, for orchestrating the Adder and Remover APIs in SNARL
-
Macros for resource handling, including with-connection-tx, with-connnection-pool, and with-transaction
-
Support for programming Stardog applications with either the connection pool or direct handling of the connection
-
Idiomatic clojure handling of data structures, with converters that can be passed to query functions
The API with source docs can be found in the stardog.core and stardog.values namespaces.
60. API Overview
The API provides a natural progression of functions for interacting with Stardog
(create-db-spec "testdb" "snarl://localhost:5820/" "admin" "admin" "none")
This creates a connection space for use in connect or make-datasource with the potential parameters:
{:url "snarl://localhost:5820/" :db "testdb" :pass "admin" :user "admin" :max-idle 100 :max-pool 200 :min-pool 10 :reasoning "none"}`
Create a single Connection using the database spec. Can be used with
with-open, with-transaction, and with-connection-tx macros.
(connect db-spec)
Creates a data source, i.e. ConnectionPool, using the database spec. Best used
within the with-connection-pool macro.
(make-datasource db-spec)
Executes the body with a transaction on each of the connections. Or establishes a connection and a transaction to execute the body within.
(with-transaction [connection...] body)
(with-connection-tx binding-forms body)
Evaluates body in the context of an active connection obtained from the connection pool.
(with-connection-pool [con pool] .. con, body ..)
61. Examples
Here are some examples of using stardog-clj
61.1. Create a connection and run a query
=> (use 'stardog.core)
=> (def c (connect {:db "testdb" :url "snarl://localhost"}))
=> (def results (query c "select ?n { .... }"))
=> (take 5 results)
({:n #<StardogURI http://example.org/math#2>} {:n #<StardogURI http://example.org/math#3>} {:n #<StardogURI http://example.org/math#5>} {:n #<StardogURI http://example.org/math#7>} {:n #<StardogURI http://example.org/math#11>})
=> (def string-results (query c "select ?n { .... }" {:converter str}))
=> (take 5 string-results)
({:n "http://example.org/math#2"} {:n "http://example.org/math#3"} {:n "http://example.org/math#5"} {:n "http://example.org/math#7"} {:n "http://example.org/math#11"})
61.2. Insert data
(let [c (connect test-db-spec)]
(with-transaction [c]
(insert! c ["urn:test" "urn:test:clj:prop2" "Hello World"])
(insert! c ["urn:test" "urn:test:clj:prop2" "Hello World2"]))
61.3. Run a query with a connection pool
myapp.core=> (use 'stardog.core)
nil
myapp.core=> (def db-spec (create-db-spec "testdb" "snarl://localhost:5820/" "admin" "admin" "none"))
#'myapp.core/db-spec
myapp.core=> (def ds (make-datasource db-spec))
myapp.core=> (with-connection-pool [conn ds]
#_=> (query conn "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 2"))
({:s #<URI urn:test1>, :p #<URI urn:test:predicate>, :o "hello world"} {:s #<URI urn:test1>, :p #<URI urn:test:predicate>, :o "hello world2"})
61.4. SPARQL Update
;; First, add a triple
;; Then run an udpate query, which is its own transaction
;; Finally, confirm via ask
(with-open [c (connect test-db-spec)]
(with-transaction [c]
(insert! c ["urn:testUpdate:a1" "urn:testUpdate:b" "aloha world"]))
(update c "DELETE { ?a ?b \"aloha world\" } INSERT { ?a ?b \"shalom world\" } WHERE { ?a ?b \"aloha world\" }"
{:parameters {"?a" "urn:testUpdate:a1" "?b" "urn:testUpdate:b"}})
(ask c "ask { ?s ?p \"shalom world\" }") => truthy)
61.5. Graph function for Construct queries
;; Graph results converted into Clojure data using the values methods
(with-open [c (connect test-db-spec)]
(let [g (graph c "CONSTRUCT { <urn:test> ?p ?o } WHERE { <urn:test> ?p ?o } ")]
g) => (list [(as-uri "urn:test") (as-uri "urn:test:clj:prop3") "Hello World"]))
.Net Programming
In the Network Programming section, we looked at how to interact with Stardog over a network via HTTP and SNARL protocols. In this chapter we describe how to program Stardog from .Net using http://www.dotnetrdf.org.
|
Note
|
.dotNetRDF is an open source library developed and supported by third parties; questions or issues with the .Net API should be directed to http://www.dotnetrdf.org. |
You should also be aware that dotNetRDF uses the HTTP API for all communication with Stardog so you must enable the HTTP server to use Stardog from .Net. It’s enabled by default so most users should not need to do anything to fulfill this requirement.
62. dotNetRDF Documentation
See the documentation for using dotNetRDF with Stardog.
Spring Programming
The Spring for Stardog source code is available on Github. Binary releases are available on the Github release page.
As of 2.1.3, Stardog-Spring and Stardog-Spring-Batch can both be retrieved from Maven central:
-
com.complexible.stardog:stardog-spring:2.1.3 -
com.complexible.stardog:stardog-spring-batch:2.1.3
The corresponding Stardog Spring version will match the Stardog release, e.g. stardog-spring-2.2.2 for Stardog 2.2.2.
63. Overview
Spring for Stardog makes it possible to rapidly build Stardog-backed applications with the Spring Framework. As with many other parts of Spring, Stardog’s Spring integration uses the template design pattern for abstracting standard boilerplate away from application developers.
Stardog Spring can be included via Maven with
com.complexible.stardog:stardog-spring:version and
com.complexible.stardog:stardog-spring-batch for Spring Batch support. Both
of these dependencies require the local mavenInstall script be run, and the
Stardog Spring packages installed in Maven. Embedded server is still supported,
but via providing an implementatino of the Provider interface. This enables
users of the embedded server to have full control over how to use the embedded
server.
At the lowest level, Spring for Stardog includes
-
DataSouceandDataSourceFactoryBeanfor managing Stardog connections -
SnarlTemplatefor transaction- and connection-pool safe Stardog programming -
DataImporterfor easy bootstrapping of input data into Stardog
In addition to the core capabilities, Spring for Stardog also integrates
with the Spring Batch framework. Spring Batch enables complex batch
processing jobs to be created to accomplish tasks such as ETL or legacy
data migration. The standard ItemReader and ItemWriter interfaces are
implemented with a separate callback writing records using the SNARL
Adder API.
64. Basic Spring
There are three Beans to add to a Spring application context:
-
DataSourceFactoryBean:com.clarkparsia.stardog.ext.spring.DataSourceFactoryBean -
SnarlTemplate:com.clarkparsia.stardog.ext.spring.SnarlTemplate -
DataImporter:com.clarkparsia.stardog.ext.spring.DataImporter
DataSourceFactoryBean is a Spring FactoryBean that configures and
produces a DataSource. All of the Stardog ConnectionConfiguration
and ConnectionPoolConfig methods are also property names of the
DataSourceFactoryBean--for example, "to", "url", "createIfNotPresent".
DataSource is a Spring for Stardog class, similar to
javax.sql.DataSource, that can be used to retrieve a Connection from
the ConnectionPool. This additional abstraction serves as place to add
Spring-specific capabilities (e.g. spring-tx support in the future)
without directly requiring Spring in Stardog.
SnarlTemplate provides a template abstraction over much of Stardog’s
native API, SNARL, and follows the same approach of other
Spring template, i.e., JdbcTemplate, JmsTemplate, and so on.
Spring for Stardog also comes with convenience mappers, for
automatically mapping result set bindings into common data types. The
SimpleRowMapper projects the BindingSet as a List> and a
SingleMapper that accepts a constructor parameter for binding a single
parameter for a single result set.
For example,
String sparql = "SELECT ?b WHERE { ?a <http://purl.org/dc/elements/1.1/title> ?b } LIMIT 1";
String result = snarlTemplate.queryForObject(sparql, new SingleMapper("b"));
The key methods on SnarlTemplate include the following:
query(String sparqlQuery, Map args, RowMapper)
query() executes the SELECT query with provided argument list, and
invokes the mapper for result rows.
doWithAdder(AdderCallback)
doWithAdder() is a transaction- and connection-pool safe adder call.
doWithGetter(String subject, String predicate, GetterCallback)
doWithGetter() is the connection pool boilerplate method for the
Getter interface, including the programmatic filters.
doWithRemover(RemoverCallback)
doWithRemover() As above, the remover method that is transaction and
pool safe.
execute(ConnectionCallback)
execute() lets you work with a connection directly; again, transaction
and pool safe.
construct(String constructSparql, Map args, GraphMapper)
construct() executes a SPARQL CONSTRUCT query with provided argument
list, and invokes the GraphMapper for the result set.
DataImporter is a new class that automates the loading of RDF files
into Stardog at initialization time.
It uses the Spring Resource API, so files can be loaded anywhere that is
resolvable by the Resource API: classpath, file, url, etc. It has a
single load method for further run-time loading and can load a list of
files at initialization time. The list assumes a uniform set of file
formats, so if there are many different types of files to load with
different RDF formats, there would be different DataImporter beans
configured in Spring.
Here’s a sample applicationContext:
<bean name="dataSource" class="com.clarkparsia.stardog.ext.spring.DataSourceFactoryBean">
<property name="to" value="testdb"/>
<property name="createIfNotPresent" value="true"/>
</bean>
<bean name="template" class="com.clarkparsia.stardog.ext.spring.SnarlTemplate">
<property name="dataSource" ref="dataSource"/>
</bean>
<bean name="importer" class="com.clarkparsia.stardog.ext.spring.DataImporter">
<property name="snarlTemplate" ref="template"/>
<property name="format" value="N3"/>
<property name="inputFiles">
<list>
<value>classpath:sp2b_10k.n3</value>
</list>
</property>
</bean>
Another example with reasoning and credentials set in the factory bean:
<bean name="dataSource" class="com.complexible.stardog.ext.spring.DataSourceFactoryBean">
<property name="to" value="testdb"/>
<property name="username" value="admin"/>
<property name="password" value="admin"/>
<property name="reasoningType" value="QL"/>
</bean>
65. Spring Batch
In addition to the base DataSource and SnarlTemplate, Spring Batch
support adds the following:
-
SnarlItemReader:com.clarkparsia.stardog.ext.spring.batch.SnarlItemReader -
SnarlItemWriter:com.clarkparsia.stardog.ext.spring.batch.SnarlItemWriter -
BatchAdderCallback:com.clarkparsia.stardog.ext.spring.batch.BatchAdderCallback
These beans can then be used within Spring Batch job definition, for example:
<gist>4570209?file=batchContext.xml</gist>
<bean id="snarlReader" class="com.clarkparsia.stardog.ext.spring.batch.SnarlItemReader" scope="step">
<property name="dataSource" ref="dataSource"/>
<property name="query" value="SELECT ?a ?b WHERE { ?a <urn:test:predicate> ?b }"/>
<property name="rowMapper" ref="testRowMapper"/>
</bean>
<bean id="snarlWriter" class="com.clarkparsia.stardog.ext.spring.batch.SnarlItemWriter" scope="step">
<property name="dataSource" ref="dataSource"/>
<property name="callback" ref="testBatchCallback"/>
</bean>
<batch:job id="simpleJob" >
<batch:step id="simpleStep">
<batch:tasklet task-executor="syncTaskExecutor" throttle-limit="5">
<batch:chunk reader="snarlReader" writer="snarlWriter" commit-interval="5"/>
</batch:tasklet>
</batch:step>
</batch:job>
66. Examples
66.1. query() with SELECT queries
String sparql = "SELECT ?a ?b WHERE { ?a <urn:test:b> ?b } LIMIT 5";
List<Map<String,String>> results = snarlTemplate.query(sparql, new RowMapper<Map<String,String>>() {
@Override
public Map<String,String> mapRow(BindingSet bindingSet) {
Map<String,String> map = new HashMap<String,String>();
map.put("a", bindingSet.getValue("a").stringValue());
map.put("b", bindingSet.getValue("b").stringValue());
return map;
}
});
66.2. doWithGetter
List<String> results = snarlTemplate.doWithGetter(null, "urn:test:n", new GetterCallback<String>() {
@Override
public String processStatement(Statement statement) {
return statement.getObject().stringValue();
}
});
66.3. doWithAdder
snarlTemplate.doWithAdder(new AdderCallback<Boolean>() {
@Override
public Boolean add(Adder adder) throws StardogException {
String uriA = "urn:test:j";
String uriB = "urn:test:k";
String litA = "hello world";
String litB = "goodbye";
adder.statement(new URIImpl(uriA), new URIImpl(uriB), new LiteralImpl(litA));
adder.statement(new URIImpl(uriA), new URIImpl(uriB), new LiteralImpl(litB));
return true;
}
});
66.4. doWithRemover
snarlTemplate.doWithRemover(new RemoverCallback<Boolean>() {
@Override
public Boolean remove(Remover remover) throws StardogException {
remover.statements(new URIImpl("urn:test:m"), new URIImpl("urn:test:n"), null);
return true;
}
});
66.5. construct()
String sparql = "CONSTRUCT { ?a <urn:test:new> ?b } WHERE { ?a <urn:test:p> ?b }";
List<Map<String,String>> results = snarlTemplate.construct(sparql, new GraphMapper<Map<String,String>>() {
@Override
public Map<String, String> mapRow(Statement next) {
Map<String,String> map = new HashMap<String,String>();
map.put(next.getSubject().stringValue(), next.getObject().stringValue());
return map;
}
});
66.6. update()
SnarlTemplate tmp = new SnarlTemplate();
tmp.setDataSource(dataSource);
String sparql = "DELETE { ?a ?b \"aloha world\" } INSERT { ?a ?b \"shalom world\" } WHERE { ?a ?b \"aloha world\" }";
Map<String, Object> params = new HashMap<String, Object>() {{
put("b", new URIImpl(uriB));
}};
// Execute the SPARQL Update query
tmp.update(sparql, params);
Groovy Programming
Groovy is an agile and dynamic programming language for the JVM, making popular programming features such as closures available to Java developers. Stardog’s Groovy support makes life easier for developers who need to work with RDF, SPARQL, and OWL by way of Stardog.
The Groovy for Stardog source code is available on Github.
Binary releases are available on the Github release
page and via Maven
central as of version 2.1.3 and beyond using the following dependency
declaration (Gradle style) com.complexible.stardog:stardog-groovy:2.1.3.
As of version 2.1.3, Stardog-Groovy can be included via "com.complexible.stardog:stardog-groovy:2.1.3" from Maven central.
|
Note
|
You must run "mavenInstall" to get the Stardog client dependencies into your local repository. |
Using the embedded server with Stardog Groovy is not supported in 2.1.2, due to conflicts of the asm library for various third party dependencies. If you wish to use the embedded server with similar convenience APIs, please try Stardog with Spring. Also 2.1.3 and beyond of Stardog-Groovy no longer requires the use of the Spring framework.
The Stardog-Groovy version always matches the Stardog release, e.g. for Stardog 2.2.2 use stardog-groovy-2.2.2.
67. Overview
Groovy for Stardog provides a set of Groovy API wrappers for developers to build applications with Stardog and take advantage of native Groovy features. For example, you can create a Stardog connection pool in a single line, much like Groovy SQL support. In Groovy for Stardog, queries can be iterated over using closures and transaction safe closures can be executed over a connection.
For the first release, Groovy for Stardog includes
com.clarkparsia.stardog.ext.groovy.Stardog with the following methods:
-
Stardog(map)constructor for managing Stardog connection pools -
each(String, Closure)for executing a closure over a query’s results, including projecting SPARQL result variables into the closure. -
query(String, Closure)for executing a closure over a query’s results, passing the BindingSet to the closure -
insert(List)for inserting a list of vars as a triple, or a list of list of triples for insertion -
remove(List)for removing a triple from the database -
withConnectionfor executing a closure with a transaction safe instance ofConnection
68. Examples
Here are some examples of the more interesting parts of Stardog Groovy.
68.1. Create a Connection
def stardog = new Stardog([url: "snarl://localhost:5820/", to:"testdb", username:"admin", password:"admin"])
stardog.query("select ?x ?y ?z WHERE { ?x ?y ?z } LIMIT 2", { println it } )
// in this case, it is a BindingSet, ie TupleQueryResult.next() called until exhausted and closure executed
68.2. SPARQL Vars Projected into Groovy Closures
// there is also a projection of the results into the closure's binding
// if x, y, or z are not populated in the answer, then they are still valid binidng but are null
stardog.each("select ?x ?y ?z WHERE { ?x ?y ?z } LIMIT 2", {
println x
println y
println z // may be a LiteralImpl, so you get full access to manipulate Value objects
}
)
68.3. Add & Remove Triples
// insert and remove
stardog.insert([["urn:test3", "urn:test:predicate", "hello world"],
["urn:test4", "urn:test:predicate", "hello world2"]])
stardog.remove(["urn:test3", "urn:test:predicate", "hello world"])
stardog.remove(["urn:test4", "urn:test:predicate", "hello world2"])
68.4. withConnection Closure
// withConnection, tx safe
stardog.withConnection { con ->
def queryString = """
SELECT ?s ?p ?o
{
?s ?p ?o
}
"""
TupleQueryResult result = null;
try {
Query query = con.query(queryString);
result = query.executeSelect();
while (result.hasNext()) {
println result.next();
}
result.close();
} catch (Exception e) {
println "Caught exception ${e}"
}
}
68.5. SPARQL Update Support
// Accepts the SPARQL Update queries
stardog.update("DELETE { ?a ?b \"hello world2\" } INSERT { ?a ?b \"aloha world2\" } WHERE { ?a ?b \"hello world2\" }")
def list = []
stardog.query("SELECT ?x ?y ?z WHERE { ?x ?y \"aloha world2\" } LIMIT 2", { list << it } )
assertTrue(list.size == 1)
SNARL Migration Guide
69. Deprecating and Renaming
-
All deprecated methods have been removed.
-
All
com.clarkparsiapackages have been moved tocom.complexible. -
com.clarkparsia.stardog.reasoning.ReasoningTypehas been moved tocom.complexible.stardog.reasoning.api.ReasoningType. -
com.clarkparsia.openrdf.queryhas been moved toorg.openrdf.queryrender. -
Everything else in
com.clarkparsia.openrdfhas been moved tocom.complexible.common.openrdf. -
All methods marked @Beta have been promoted.
70. Queries
We introduced a new hierarchy for the class com.complexible.stardog.api.Query:
+ com.complexible.stardog.api.Query
+ com.complexible.stardog.api.ReadQuery
+ com.complexible.stardog.api.BooleanQuery
+ com.complexible.stardog.api.GraphQuery
+ com.complexible.stardog.api.SelectQuery
+ com.complexible.stardog.api.UpdateQuery
Queries can be created from a com.complexible.stardog.api.Connection object
using the suitable method according to desired type of query: select, ask,
graph, or update.
Now you can specify the reasoning type with which a particular
com.complexible.stardog.api.ReadQuery is to be executed via the method
reasoning(ReasoningType). The query reasoning type overrides the reasoning
type of the parent connection. Note that setting the reasoning type to
ReasoningType.NONE will disable reasoning for that particular query, it does
not affect the default reasoning specified by the Connection.
The methods executeAsk(), executeSelect(), and executeGraph() on
com.complexible.stardog.api.Query have been removed. Queries can be executed
by using the execute() method which will return a value appropriate for the
type of query being executed.
71. Connections
The class com.complexible.stardog.api.admin.StardogDBMS was removed. It has
been replaced by
com.complexible.stardog.api.admin.AdminConnectionConfiguration for creating
connections to the Stardog DBMS and
com.complexible.stardog.api.admin.AdminConnection for the actual connection.
The method login on com.complexible.stardog.api.admin.StardogDBMS (now
com.complexible.stardog.api.admin.AdminConnectionConfiguration) has been
renamed connect to align with usage of the standard
com.complexible.stardog.api.ConnectionConfiguration
The method connect(ReasoningType) on
com.complexible.stardog.api.ConnectionConfiguration has been removed.
The method getBaseConnection() on
com.complexible.stardog.api.reasoning.ReasoningConnection has been removed.
To obtain a ReasoningConnection from a base connection, simply use
Connection.as(ReasoningConnection.class).
72. Explanations
The explain functions of
com.complexible.stardog.api.reasoning.ReasoningConnection now return
com.complexible.stardog.reasoning.Proof objects. The
com.complexible.stardog.reasoning.Proof.getStatements() function can be used
to get only the asserted statements which is equivalent to what explain
functions returned in 1.x.
73. Starting the server
In order to create a new server we use a ServerBuilder obtained via the method
buildServer() on com.complexible.stardog.Stardog; configuration options can
be set(Option<T>, T) and the server is created for a particular address with
bind. The following example shows how to create a new embedded SNARL server.
Server aServer = Stardog
.buildServer()
.bind(SNARLProtocol.EMBEDDED_ADDRESS)
.start();
When programmatically starting a Stardog server in your application, you must stop the server when you’re done with it, otherwise it can prevent the JVM from exiting.
74. Protocols
As of Stardog 2.0, Stardog’s supported protocols, SNARL & HTTP, now run on the
same port. There is no need to start separate servers or specify different
ports. The new unified Stardog server will automatically detect what protocol
you are using and forward the traffic appropriately. The default port for the
server remains 5820.
75. Command line
The global options --home, --logfile, --disable-security for server start
command have been turned into regular options. See the stardog-admin help
server start for details.
Understanding Stardog
76. Man Pages
Stardog command-line interface is comprehensively documented in man pages that
ship with Stardog. Those man pages are reproduced here in HTML as a
convenience to the reader. To install the man pages locally in your
environment:
$ cp docs/man/man1/* /usr/local/share/man1
$ cp docs/man/man8/* /usr/local/share/man8
$ mandb
$ man stardog-admin-server-start
76.1. Stardog CLI
76.2. Stardog Admin CLI
-
db backup,db copy,db create,db drop,db list,db migrate,db offline,db online,db optimize,db restore,db status -
role add,role grant,role list,role permission,role remove,role revoke -
user add,user addrole,user disable,user enable,user grant,user list,user passwd,user permission,user remove,user removerole,user revoke
77. Benchmark Results
Live, dynamically updated performance data from BSBM, SP2B, LUBM benchmarks against the latest Stardog release.
78. Frequently Asked Questions
Some frequently asked questions for which we have answers.
78.1. Why don’t my queries work?!
- Question
-
I’ve got some named graphs and blah blah my queries don’t work blah blah.
- Answer
-
Queries with FROM NAMED with a named graph that is not in Stardog will not cause Stardog to download the data from an arbitrary HTTP URL and include it in the query. Stardog will only evaluate queries over data that has been loaded into it.
SPARQL queries without a context or named graph are executed against the default, unnamed graph. In Stardog, the default graph is not the union of all the named graphs and the default graph. This behavior is configurable via the
query.all.graphsconfiguration parameter.
78.3. Deadlocks and Slowdowns
- Question
-
Stardog slows down or deadlocks?! I don’t understand why, I’m just trying to send some queries and do something with the results…in a tight inner loop of doom!
- Answer
-
Make sure you are closing result sets (
TupleQueryResultandGraphQueryResult; or the Jena equivalents) when you are done with them. These hold open resources both on the client and on the server and failing to close them when you are done will cause files, streams, lions, tigers, and bears to be held open. If you do that enough, then you’ll eventually exhaust all of the resources in their respective pools, which can cause slowness or, in some cases, deadlocks waiting for resources to be returned.Similarly close your connections when you are done with them. Failing to close
Connections,Iterations,QueryResults, and other closeable objects will lead to undesirable behavior.
78.4. Bulk Update Performance
- Question
-
I’m adding one triple at a time, in a tight loop, to Stardog; is this the ideal strategy with respect to performance?
- Question
-
I’m adding millions of triples to Stardog and I’m wondering if that’s the best approach?
- Answer
-
The answer to both questions is "not really"…Generally overall update performance is best if you write between 1k and 100k triples at a time. You may need to experiment to find the sweet spot with respect to your data, database size, the size of the differential index, and update frequency.
78.5. Public Endpoint
- Question
-
I want to use Stardog to serve a public SPARQL endpoint; is there some way I can do this without publishing user account information?
- Answer
-
We don’t necessarily recommend this, but it’s possible. Simply pass
--disable-securitytostardog-adminwhen you start the Stardog Server. This completely disables security in Stardog which will let users access the SPARQL endpoint, and all other functionality, without needing authorization.
78.6. Remote Bulk Loading
- Question
-
I’m trying to create a database and bulk load files from my machine to the server and it’s not working, the files don’t seem to load, what gives?
- Answer
-
Stardog does not tranfser files during database creation to the server, sending big files over a network kind of defeats the purpose of blazing fast bulk loading. If you want to bulk load files from your machine to a remote server, copy them to the server and bulk load them.
79. Compatibility Policies
The Stardog 2.x release ("Stardog" for short) is a major milestone in the development of the system. Stardog is a stable platform for the growth of projects and programs written for Stardog.
Stardog provides (and defines) several user-visible things:
-
SNARL API
-
BigPacket Message Format
-
Stardog Extended HTTP Protocol
-
a command-line interface
It is intended that programs—as well as SPARQL queries—written to
Stardog APIs, protocols, and interfaces will continue to run correctly,
unchanged, over the lifetime of Stardog. That is, over all releases
identified by version 2.x. At some indefinite point, Stardog 3.x may
be released; but, until that time, Stardog programs that work today
should continue to work even as future releases of Stardog
(2.1, 2.2, etc.) occur.
APIs, protocols, and interfaces may grow, acquiring new parts and features, but not in a way that breaks existing Stardog programs.
79.1. Expectations
Although we expect that nearly all Stardog programs will maintain this compatibility over time, it is impossible to guarantee that no future change will break any program. This document sets expectations for the compatibility of Stardog programs in the future. The main, foreseeable reasons for which this compatibility may be broken in the future include:
-
Security: We reserve the right to break compatibility if doing so is required to address a security problem in Stardog.
-
Unspecified behavior: Programs that depend on unspecified<fn>The relevant specs include the Stardog-specific specifications documented on this site, but also W3C (and other) specifications of various languages, including SPARQL, RDF, RDFS, OWL 2, HTTP, Google Protocol Buffers, as well as others.</fn> behaviors may not work in the future if those behaviors are modified.
-
3rd Party Specification Errors: It may become necessary to break compatibility of Stardog programs in order to address problems in some 3rd party specification.
-
Bugs: It will not always be possible to fix bugs found in Stardog—or in its 3rd party dependencies—while also preserving compatibility. With that proviso, we will endeavor to only break compatibility when repairing critical bugs.
It is always possible that the performance of a Stardog program may be (adversely) affected by changes in the implementation of Stardog. No guarantee can be made about the performance of a given program between releases, except to say that our expectation is that performance will generally trend in the appropriate direction.
79.2. Data Migration & Safety
We expect that data safety will always be given greater weight than any other consideration. But since Stardog stores a user’s data differently from the form in which data is input to Stardog, we may from time to time change the way it is stored such that explicit data migration will be necessary.
Stardog provides for two data migration strategies:
-
Command-line migration tool(s)
-
Dump and reload
We expect that explicit migrations may be required from time to time between different releases of Stardog 2.x. We will endeavor to minimize the need for such migrations. We will only require the "dump and reload" strategy between major releases of Stardog (that is, from 1.x to 2.x, etc.), unless that strategy of migration is required to repair a security or other data safety bug.
79.3. Code Migration
Finally, the 2.0 release is not backward compatible with 1.x in two respects:
-
license keys for customers must be regenerated for 2.x; this is one-time change
-
SNARL API 2.0 introduces backward incompatible changes and all code has been repackages (also a one-time change)
See the SNARL Migration Guide for more details.
80. Known Issues
The known issues in Stardog 2.2.4:
-
Our
CONSTRUCTslightly deviates from SPARQL 1.1 specification in that it does not implicitlyDISTINCTquery results; rather, it implicitly appliesREDUCEDsemantics toCONSTRUCTquery results.[41] -
Asking for all individuals with reasoning via the query
{?s a owl:Thing}might also retrieve some classes and properties. WILLFIX -
Schema queries do not bind graph variables.
-
Dropping a database with the CLI deletes all of the data files in Stardog Home associated with that database. If you want to keep the data files and remove the database from the system catalog, then you need to manually copy these files to another location before deleting the database.
-
If relative URIs exist in the data files passed to create, add, or remove commands, then they will be resolved using the constant base URI
http://api.stardog.com/iff the format of the file allows base URIs. Turtle and RDF/XML formats allows base URIs but N-Triples format doesn’t allow base URIs and relative URIs in N-Triples data will cause errors. -
Queries with
FROM NAMEDwith a named graph that is not in Stardog will not cause Stardog to download the data from an arbitrary HTTP URL and include it in the query. Stardog will only evaluate queries over data that has been loaded into it. -
SPARQL queries without a context or named graph are executed against the default, unnamed graph. In Stardog, the default graph is not the union of all the named graphs and the default graph. Note: this behavior is configurable via the
query.all.graphsconfiguration parameter. -
RDF literals are limited to 8MB (after compression) in Stardog. Input data with literals larger than 8MB (after compression) will raise an exception.
81. Glossary
In the Stardog documentation, the following terms have a specific technical meaning.
| Stardog Database Management System, aka Stardog Server |
An instance of Stardog; only one Stardog Server may run per JVM. A computer may run multiple Stardog Servers by running one per multiple JVMs. |
Stardog Home, aka STARDOG_HOME
|
A directory in a filesystem in which Stardog
stores files and other information; established either in a Stardog
configuration file or by environment variable. Only one Stardog Server may run
simultaneously from a |
| Stardog Network Home |
A URL (HTTP or SNARL) which identifies a Stardog Server running on the network. |
| Database |
A Stardog database is a graph of RDF data under management of a Stardog Server. It may contain zero or more RDF Named Graphs. A Stardog Server may manage more than one Database; there is no hard limit, and the practical limit is disk space. |
| Database Short Name, aka Database Name |
An identifier used to name a database, provided as input when a database is created. |
| Database Network Name |
A Database Short Name is part of the URI of a Database addressed over some network protocol. |
| Index |
The unit of persistence for a Database. We sometimes (sloppily) use Database and Index interchangeably in the manual. |
| Memory Database |
A Database may be stored in-memory or on disk; a Memory Database is read entirely into system memory but can be (optionally) persisted to disk. |
| Disk Database |
A Disk Database is only paged into system memory as needed and is persisted using one or more indexes. |
| Connection String |
An identifier (a restricted subset of legal URLs, actually) that is used to connect to a Stardog database to send queries or perform other operations. |
| Named Graph |
A Named Graph is an explicitly named unit of data within a Database. Named Graphs are queries explicitly by specifying them in SPARQL queries. There is no practical limit on the number of Named Graphs in a Database. |
| Default Graph |
The Default Graph in a Database is the context into which RDF triples are stored when a Named Graph is not explicitly specified. A SPARQL query executed by Stardog that does not contain any Named Graph statements is executed against the data in the Default Graph only. |
| Security Realm |
A Security Realm defines the users and their permissions for each Database in an Stardog Server. There is only one Security Realm per Stardog Server. |
Appendix
SPARQL Query Functions
Stardog supports all of the functions in SPARQL, as well as some others from XPath and SWRL. Any of these functions can be used in queries or rules. Some functions appear in multiple namespaces, but all of the namespaces will work:
| Prefix | Namespace |
|---|---|
|
|
|
|
|
|
|
|
|
|
The function names and URIs supported by Stardog are included below. Some of these functions exist in SPARQL natively, which just means they can be used without an explicit namespace.
| Function name | Recognized URI(s) |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Stardog Milestones
This timeline describes major features and other notable changes to Stardog starting at 1.0; it will be updated for each notable new release. For a complete list of changes, including notable bug fixes, see the release notes.
| 2.2.1 |
|
| 2.2 |
|
| 2.1 |
|
| 2.0 |
|
| 1.2 |
|
| 1.1.2 |
|
| 1.1 |
|
| 1.0.4 |
|
| 1.0.2 |
|
| 1.0.1 |
|
| 1.0 |
|